mirror of
https://github.com/postgres/postgres.git
synced 2025-04-29 13:56:47 +03:00
Doc: add a little about LACON execution to src/backend/regex/README.
I wrote this while thinking about a possible optimization, but it's a useful description of the existing code regardless of whether the optimization ever happens. So push it separately.
This commit is contained in:
parent
375aed36ad
commit
10d58228bb
@ -438,3 +438,36 @@ BOS/BOL/EOS/EOL adjacent to the pre-state and post-state. So a finished
|
|||||||
NFA for a pattern without anchors or adjacent-character constraints will
|
NFA for a pattern without anchors or adjacent-character constraints will
|
||||||
have pre-state outarcs for RAINBOW (all possible character colors) as well
|
have pre-state outarcs for RAINBOW (all possible character colors) as well
|
||||||
as BOS and BOL, and likewise post-state inarcs for RAINBOW, EOS, and EOL.
|
as BOS and BOL, and likewise post-state inarcs for RAINBOW, EOS, and EOL.
|
||||||
|
Also note that LACON arcs will never connect to the pre-state
|
||||||
|
or post-state.
|
||||||
|
|
||||||
|
|
||||||
|
Look-around constraints (LACONs)
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
|
The regex compiler doesn't have much intelligence about LACONs; it just
|
||||||
|
constructs a sub-NFA representing the pattern that the constraint says to
|
||||||
|
match or not match, and puts a LACON arc referencing that sub-NFA into the
|
||||||
|
main NFA. At runtime, the executor applies the sub-NFA at each point in
|
||||||
|
the string where the constraint is relevant, and then traverses or doesn't
|
||||||
|
traverse the arc. ("Traversal" means including the arc's to-state in the
|
||||||
|
set of NFA states that are considered active at the next character.)
|
||||||
|
|
||||||
|
The actual basic matching cycle of the executor is
|
||||||
|
1. Identify the color of the next input character, then advance over it.
|
||||||
|
2. Apply the DFA to follow all the matching "plain" arcs of the NFA.
|
||||||
|
(Notionally, the previous DFA state represents the set of states the
|
||||||
|
NFA could have been in before the character, and the new DFA state
|
||||||
|
represents the set of states the NFA could be in after the character.)
|
||||||
|
3. If there are any LACON arcs leading out of any of the new NFA states,
|
||||||
|
apply each LACON constraint starting from the new next input character
|
||||||
|
(while not actually consuming any input). For each successful LACON,
|
||||||
|
add its to-state to the current set of NFA states. If any such
|
||||||
|
to-state has outgoing LACON arcs, process those in the same way.
|
||||||
|
(Mathematically speaking, we compute the transitive closure of the
|
||||||
|
set of states reachable by successful LACONs.)
|
||||||
|
|
||||||
|
Thus, LACONs are always checked immediately after consuming a character
|
||||||
|
via a plain arc. This is okay because the NFA's "pre" state only has
|
||||||
|
plain out-arcs, so we can always consume a character (possibly a BOS
|
||||||
|
pseudo-character as described above) before we need to worry about LACONs.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user