diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 780872f7a10..730ac35b48b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5187,10 +5187,37 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); The quantifiers {1,1} and {1,1}? can be used to force greediness or non-greediness, respectively, on a subexpression or a whole RE. + This is useful when you need the whole RE to have a greediness attribute + different from what's deduced from its elements. As an example, + suppose that we are trying to separate a string containing some digits + into the digits and the parts before and after them. We might try to + do that like this: + +SELECT regexp_matches('abc01234xyz', '(.*)(\d+)(.*)'); +Result: {abc0123,4,xyz} + + That didn't work: the first .* is greedy so + it eats as much as it can, leaving the \d+ to + match at the last possible place, the last digit. We might try to fix + that by making it non-greedy: + +SELECT regexp_matches('abc01234xyz', '(.*?)(\d+)(.*)'); +Result: {abc,0,""} + + That didn't work either, because now the RE as a whole is non-greedy + and so it ends the overall match as soon as possible. We can get what + we want by forcing the RE as a whole to be greedy: + +SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); +Result: {abc,01234,xyz} + + Controlling the RE's overall greediness separately from its components' + greediness allows great flexibility in handling variable-length patterns. - Match lengths are measured in characters, not collating elements. + When deciding what is a longer or shorter match, + match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example: bb*