mirror of
https://github.com/postgres/postgres.git
synced 2025-05-17 06:41:24 +03:00
By default, wait_for_catchup() waits for the replication connection to reach the primary's write LSN. That's fine, but in an apparent attempt to save one query round-trip, it was coded so that we executed pg_current_wal_lsn() again during each probe query. Thus, we presented the standby with a moving target to be reached. (While the test script itself couldn't be causing the write LSN to advance while it's blocked in wait_for_catchup(), it's plenty plausible that background activity such as autovacuum is emitting more WAL.) That could make the test take longer than necessary, and potentially it could mask bugs by allowing the standby to process more WAL than a strict interpretation of the test scenario allows. So, change wait_for_catchup() to do it "by the book", explicitly collecting the write LSN to wait for at the outset. Also, various call sites were instructing wait_for_catchup() to wait for the standby to reach the primary's insert LSN rather than its write LSN. This also seems like a bad idea. While in most test scenarios those are the same, if they are different then the inserted-but-not-yet-written WAL is not presently available to the standby. The test isn't doing anything to make it become so, so again we have the potential for unwanted test delay, perhaps even a test timeout. (Again, background activity would be needed to make this more than a hypothetical problem.) Hence, change the callers where necessary so that the wait target is always the primary's write LSN. While at it, simplify callers by making use of wait_for_catchup's default arguments wherever possible (the preceding change makes this possible in more places than it was before). And rewrite wait_for_catchup's documentation a bit. Patch by me; thanks to Julien Rouhaud for review. Discussion: https://postgr.es/m/2368336.1641843098@sss.pgh.pa.us