From e0f2acf26062c6279b738738ae48e15bb5408c94 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri, 20 Aug 2021 14:19:04 -0400
Subject: [PATCH] Fix performance bug in regexp's citerdissect/creviterdissect.

After detecting a sub-match "dissect" failure (i.e., a backref match
failure) in the i'th sub-match of an iteration node, we should proceed
by adjusting the attempted length of the i'th submatch.  As coded,
though, these functions changed the attempted length of the *last*
sub-match, and only after exhausting all possibilities for that would
they back up to adjust the next-to-last sub-match, and then the
second-from-last, etc; all of which is wasted effort, since only
changing the start or length of the i'th sub-match can possibly make
it succeed.  This oversight creates the possibility for exponentially
bad performance.  Fortunately the problem is masked in most cases by
optimizations or constraints applied elsewhere; which explains why
we'd not noticed it before.  But it is possible to reach the problem
with fairly simple, if contrived, regexps.

Oversight in my commit 173e29aa5.  That's pretty ancient now,
so back-patch to all supported branches.

Discussion: https://postgr.es/m/1808998.1629412269@sss.pgh.pa.us
---
 src/backend/regex/regexec.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/backend/regex/regexec.c b/src/backend/regex/regexec.c
index f7eaa76b02c..47043badf77 100644
--- a/src/backend/regex/regexec.c
+++ b/src/backend/regex/regexec.c
@@ -1098,8 +1098,8 @@ citerdissect(struct vars *v,
 	 * Our strategy is to first find a set of sub-match endpoints that are
 	 * valid according to the child node's DFA, and then recursively dissect
 	 * each sub-match to confirm validity.  If any validity check fails,
-	 * backtrack the last sub-match and try again.  And, when we next try for
-	 * a validity check, we need not recheck any successfully verified
+	 * backtrack that sub-match and try again.  And, when we next try for a
+	 * validity check, we need not recheck any successfully verified
 	 * sub-matches that we didn't move the endpoints of.  nverified remembers
 	 * how many sub-matches are currently known okay.
 	 */
@@ -1187,12 +1187,13 @@ citerdissect(struct vars *v,
 			return REG_OKAY;
 		}
 
-		/* match failed to verify, so backtrack */
+		/* i'th match failed to verify, so backtrack it */
+		k = i;
 
 backtrack:
 
 		/*
-		 * Must consider shorter versions of the current sub-match.  However,
+		 * Must consider shorter versions of the k'th sub-match.  However,
 		 * we'll only ask for a zero-length match if necessary.
 		 */
 		while (k > 0)
@@ -1299,8 +1300,8 @@ creviterdissect(struct vars *v,
 	 * Our strategy is to first find a set of sub-match endpoints that are
 	 * valid according to the child node's DFA, and then recursively dissect
 	 * each sub-match to confirm validity.  If any validity check fails,
-	 * backtrack the last sub-match and try again.  And, when we next try for
-	 * a validity check, we need not recheck any successfully verified
+	 * backtrack that sub-match and try again.  And, when we next try for a
+	 * validity check, we need not recheck any successfully verified
 	 * sub-matches that we didn't move the endpoints of.  nverified remembers
 	 * how many sub-matches are currently known okay.
 	 */
@@ -1394,12 +1395,13 @@ creviterdissect(struct vars *v,
 			return REG_OKAY;
 		}
 
-		/* match failed to verify, so backtrack */
+		/* i'th match failed to verify, so backtrack it */
+		k = i;
 
 backtrack:
 
 		/*
-		 * Must consider longer versions of the current sub-match.
+		 * Must consider longer versions of the k'th sub-match.
 		 */
 		while (k > 0)
 		{