mirror of
https://github.com/postgres/postgres.git
synced 2025-04-20 00:42:27 +03:00
Despite the best efforts of commit 0e5c82380, we're still seeing occasional failures of postgres_fdw's query_cancel test in the buildfarm. Investigation suggests that its 100ms timeout is still not enough to reliably ensure that the remote side starts the query before receiving the cancel request --- and if it hasn't, it will just discard the request because it's idle. We discussed allowing a cancel request to kill the next-received query, but that would have wide and perhaps unpleasant side-effects. What seems safer is to make postgres_fdw do what a human user would likely do, which is issue another cancel request if the first one didn't seem to do anything. We'll keep the same overall 30 second grace period before concluding things are broken, but issue additional cancel requests after 1 second, then 2 more seconds, then 4, then 8. (The next one in series is 16 seconds, but we'll hit the 30 second timeout before that.) Having done that, revert the timeout in query_cancel.sql to 10 ms. That will still be enough on most machines, most of the time, for the remote query to start; but now we're intentionally risking the race condition occurring sometimes in the buildfarm, so that the repeat-cancel code path will get some testing. As before, back-patch to v17. We might eventually contemplate back-patching this further, and/or adding similar logic to dblink. But given the lack of field complaints to date, this feels like mostly an exercise in test case stabilization, so v17 is enough. Discussion: https://postgr.es/m/colnv3lzzmc53iu5qoawynr6qq7etn47lmggqr65ddtpjliq5d@glkveb4m6nop
23 lines
938 B
PL/PgSQL
23 lines
938 B
PL/PgSQL
SELECT version() ~ 'cygwin' AS skip_test \gset
|
|
\if :skip_test
|
|
\quit
|
|
\endif
|
|
|
|
-- Let's test canceling a remote query. Use a table that does not have
|
|
-- remote_estimate enabled, else there will be multiple queries to the
|
|
-- remote and we might unluckily send the cancel in between two of them.
|
|
-- First let's confirm that the query is actually pushed down.
|
|
EXPLAIN (VERBOSE, COSTS OFF)
|
|
SELECT count(*) FROM ft1 a CROSS JOIN ft1 b CROSS JOIN ft1 c CROSS JOIN ft1 d;
|
|
|
|
BEGIN;
|
|
-- Make sure that connection is open and set up.
|
|
SELECT count(*) FROM ft1 a;
|
|
-- On most machines, 10ms will be enough to be sure that we've sent the slow
|
|
-- query. We may sometimes exercise the race condition where we send cancel
|
|
-- before the remote side starts the query, but that's fine too.
|
|
SET LOCAL statement_timeout = '10ms';
|
|
-- This would take very long if not canceled:
|
|
SELECT count(*) FROM ft1 a CROSS JOIN ft1 b CROSS JOIN ft1 c CROSS JOIN ft1 d;
|
|
COMMIT;
|