1
0
mirror of https://github.com/postgres/postgres.git synced 2025-05-02 11:44:50 +03:00

Docs: add an explicit example about controlling overall greediness of REs.

Per discussion of bug #13538.
This commit is contained in:
Tom Lane 2015-08-04 21:09:12 -04:00
parent dae6e46012
commit 4eb4e71119

View File

@ -4956,10 +4956,37 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
The quantifiers <literal>{1,1}</> and <literal>{1,1}?</> The quantifiers <literal>{1,1}</> and <literal>{1,1}?</>
can be used to force greediness or non-greediness, respectively, can be used to force greediness or non-greediness, respectively,
on a subexpression or a whole RE. on a subexpression or a whole RE.
This is useful when you need the whole RE to have a greediness attribute
different from what's deduced from its elements. As an example,
suppose that we are trying to separate a string containing some digits
into the digits and the parts before and after them. We might try to
do that like this:
<screen>
SELECT regexp_matches('abc01234xyz', '(.*)(\d+)(.*)');
<lineannotation>Result: </lineannotation><computeroutput>{abc0123,4,xyz}</computeroutput>
</screen>
That didn't work: the first <literal>.*</> is greedy so
it <quote>eats</> as much as it can, leaving the <literal>\d+</> to
match at the last possible place, the last digit. We might try to fix
that by making it non-greedy:
<screen>
SELECT regexp_matches('abc01234xyz', '(.*?)(\d+)(.*)');
<lineannotation>Result: </lineannotation><computeroutput>{abc,0,""}</computeroutput>
</screen>
That didn't work either, because now the RE as a whole is non-greedy
and so it ends the overall match as soon as possible. We can get what
we want by forcing the RE as a whole to be greedy:
<screen>
SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
<lineannotation>Result: </lineannotation><computeroutput>{abc,01234,xyz}</computeroutput>
</screen>
Controlling the RE's overall greediness separately from its components'
greediness allows great flexibility in handling variable-length patterns.
</para> </para>
<para> <para>
Match lengths are measured in characters, not collating elements. When deciding what is a longer or shorter match,
match lengths are measured in characters, not collating elements.
An empty string is considered longer than no match at all. An empty string is considered longer than no match at all.
For example: For example:
<literal>bb*</> <literal>bb*</>