Wake up for latches in CheckpointWriteDelay().

The checkpointer shouldn't ignore its latch. Other backends may be waiting for it to drain the request queue. Hopefully real systems don't have a full queue often, but the condition is reached easily when shared_buffers is small. This involves defining a new wait event, which will appear in the pg_stat_activity view often due to spread checkpoints. Back-patch only to 14. Even though the problem exists in earlier branches too, it's hard to hit there. In 14 we stopped using signal handlers for latches on Linux, *BSD and macOS, which were previously hiding this problem by interrupting the sleep (though not reliably, as the signal could arrive before the sleep begins; precisely the problem latches address). Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de
2025-10-18 04:29:09 +03:00 · 2022-03-16 13:37:58 +13:00
parent a56e7b6601
commit 5e6368b42e
4 changed files with 15 additions and 1 deletions
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -484,6 +484,9 @@ CheckpointerMain(void)
 			}

 			ckpt_active = false;
+
+			/* We may have received an interrupt during the checkpoint. */
+			HandleCheckpointerInterrupts();
 		}

 		/* Check for archive_timeout and switch xlog files if necessary. */
@@ -726,7 +729,10 @@ CheckpointWriteDelay(int flags, double progress)
 		 * Checkpointer and bgwriter are no longer related so take the Big
 		 * Sleep.
 		 */
-		pg_usleep(100000L);
+		WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH | WL_TIMEOUT,
+				  100,
+				  WAIT_EVENT_CHECKPOINT_WRITE_DELAY);
+		ResetLatch(MyLatch);
 	}
 	else if (--absorb_counter <= 0)
 	{