From 4eb4e71119780a15168d03c933f41bc04b1ecd4e Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Tue, 4 Aug 2015 21:09:12 -0400 Subject: [PATCH] Docs: add an explicit example about controlling overall greediness of REs. Per discussion of bug #13538. --- doc/src/sgml/func.sgml | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index b454f836854..87c3e5a1ee1 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -4956,10 +4956,37 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); The quantifiers {1,1} and {1,1}? can be used to force greediness or non-greediness, respectively, on a subexpression or a whole RE. + This is useful when you need the whole RE to have a greediness attribute + different from what's deduced from its elements. As an example, + suppose that we are trying to separate a string containing some digits + into the digits and the parts before and after them. We might try to + do that like this: + +SELECT regexp_matches('abc01234xyz', '(.*)(\d+)(.*)'); +Result: {abc0123,4,xyz} + + That didn't work: the first .* is greedy so + it eats as much as it can, leaving the \d+ to + match at the last possible place, the last digit. We might try to fix + that by making it non-greedy: + +SELECT regexp_matches('abc01234xyz', '(.*?)(\d+)(.*)'); +Result: {abc,0,""} + + That didn't work either, because now the RE as a whole is non-greedy + and so it ends the overall match as soon as possible. We can get what + we want by forcing the RE as a whole to be greedy: + +SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); +Result: {abc,01234,xyz} + + Controlling the RE's overall greediness separately from its components' + greediness allows great flexibility in handling variable-length patterns. - Match lengths are measured in characters, not collating elements. + When deciding what is a longer or shorter match, + match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example: bb*