From 2e211c16612a127bea861e416bdc22ff39c54866 Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 10 Feb 2022 11:52:20 -0500
Subject: [PATCH] Make timeout.c more robust against missed timer interrupts.

Commit 09cf1d522 taught schedule_alarm() to not do anything if
the next requested event is after when we expect the next interrupt
to fire.  However, if somehow an interrupt gets lost, we'll continue
to not do anything indefinitely, even after the "next interrupt" time
is obviously in the past.  Thus, one missed interrupt can break
timeout scheduling for the life of the session.  Michael Harris
reported a scenario where a bug in a user-defined function caused this
to happen, so you don't even need to assume kernel bugs exist to think
this is worth fixing.  We can make things more robust at little cost
by detecting the case where signal_due_at is before "now" and forcing
a new setitimer call to occur.  This isn't a completely bulletproof
fix of course; but in our typical usage pattern where we frequently set
timeouts and clear them before they are reached, the interrupt will
get re-enabled after at most one timeout interval, which with a little
luck will be before we really need it.

While here, let's mark signal_due_at as volatile, since the signal
handler can both examine and set it.  I'm not sure there's any
actual risk given that signal_pending is already volatile, but
it's surely questionable.

Backpatch to v14 where this logic came in.

Michael Harris and Tom Lane

Discussion: https://postgr.es/m/CADofcAWbMrvgwSMqO4iG_iD3E2v8ZUrC-_crB41my=VMM02-CA@mail.gmail.com
---
 src/backend/utils/misc/timeout.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/misc/timeout.c b/src/backend/utils/misc/timeout.c
index 95a273d9cfb..b79df174da5 100644
--- a/src/backend/utils/misc/timeout.c
+++ b/src/backend/utils/misc/timeout.c
@@ -70,11 +70,12 @@ static volatile sig_atomic_t alarm_enabled = false;
 
 /*
  * State recording if and when we next expect the interrupt to fire.
+ * (signal_due_at is valid only when signal_pending is true.)
  * Note that the signal handler will unconditionally reset signal_pending to
  * false, so that can change asynchronously even when alarm_enabled is false.
  */
 static volatile sig_atomic_t signal_pending = false;
-static TimestampTz signal_due_at = 0;	/* valid only when signal_pending */
+static volatile TimestampTz signal_due_at = 0;
 
 
 /*****************************************************************************
@@ -214,10 +215,22 @@ schedule_alarm(TimestampTz now)
 
 		MemSet(&timeval, 0, sizeof(struct itimerval));
 
+		/*
+		 * If we think there's a signal pending, but current time is more than
+		 * 10ms past when the signal was due, then assume that the timeout
+		 * request got lost somehow; clear signal_pending so that we'll reset
+		 * the interrupt request below.  (10ms corresponds to the worst-case
+		 * timeout granularity on modern systems.)  It won't hurt us if the
+		 * interrupt does manage to fire between now and when we reach the
+		 * setitimer() call.
+		 */
+		if (signal_pending && now > signal_due_at + 10 * 1000)
+			signal_pending = false;
+
 		/*
 		 * Get the time remaining till the nearest pending timeout.  If it is
-		 * negative, assume that we somehow missed an interrupt, and force
-		 * signal_pending off.  This gives us a chance to recover if the
+		 * negative, assume that we somehow missed an interrupt, and clear
+		 * signal_pending.  This gives us another chance to recover if the
 		 * kernel drops a timeout request for some reason.
 		 */
 		nearest_timeout = active_timeouts[0]->fin_time;