mirror of
https://github.com/postgres/postgres.git
synced 2025-12-09 02:08:45 +03:00
Fix the general case of quantified regex back-references.
Cases where a back-reference is part of a larger subexpression that is quantified have never worked in Spencer's regex engine, because he used a compile-time transformation that neglected the need to check the back-reference match in iterations before the last one. (That was okay for capturing parens, and we still do it if the regex has *only* capturing parens ... but it's not okay for backrefs.) To make this work properly, we have to add an "iteration" node type to the regex engine's vocabulary of sub-regex nodes. Since this is a moderately large change with a fair risk of introducing new bugs of its own, apply to HEAD only, even though it's a fix for a longstanding bug.
This commit is contained in:
@@ -102,15 +102,15 @@ consists of a tree of sub-expressions ("subre"s). Leaf tree nodes are
|
||||
either plain regular expressions (which are executed as DFAs in the manner
|
||||
described above) or back-references (which try to match the input to some
|
||||
previous substring). Non-leaf nodes are capture nodes (which save the
|
||||
location of the substring currently matching their child node) or
|
||||
concatenation or alternation nodes. At execution time, the executor
|
||||
recursively scans the tree. At concatenation or alternation nodes,
|
||||
it considers each possible alternative way of matching the input string,
|
||||
ie each place where the string could be split for a concatenation, or each
|
||||
child node for an alternation. It tries the next alternative if the match
|
||||
fails according to the child nodes. This is exactly the sort of
|
||||
backtracking search done by a traditional NFA regex engine. If there are
|
||||
many tree levels it can get very slow.
|
||||
location of the substring currently matching their child node),
|
||||
concatenation, alternation, or iteration nodes. At execution time, the
|
||||
executor recursively scans the tree. At concatenation, alternation, or
|
||||
iteration nodes, it considers each possible alternative way of matching the
|
||||
input string, that is each place where the string could be split for a
|
||||
concatenation or iteration, or each child node for an alternation. It
|
||||
tries the next alternative if the match fails according to the child nodes.
|
||||
This is exactly the sort of backtracking search done by a traditional NFA
|
||||
regex engine. If there are many tree levels it can get very slow.
|
||||
|
||||
But all is not lost: we can still be smarter than the average pure NFA
|
||||
engine. To do this, each subre node has an associated DFA, which
|
||||
|
||||
Reference in New Issue
Block a user