1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-09 06:21:09 +03:00

Fix timing-dependent failure in recovery test 004_timeline_switch

The test introduced by 17b2d5ec75 verifies that a WAL receiver
survives across a timeline jump by searching the server logs for
termination messages.  However, it called restart() before the timeline
switch, which kills the WAL receiver and may log the exact message being
checked, hence failing the test.  As TAP tests reuse the same log file
across restarts, a rotate_logfile() is used before the restart so as the
log matching check is not impacted by log entries generated by a
previous shutdown.

Recent changes to file handle inheritance altered I/O timing enough to
make this fail consistently while testing another patch.

While on it, this adds an extra check based on a PID comparison.  This
test may lead to false positives as it could be possible that the WAL
receiver has processed a timeline jump before the initial PID is
grabbed, but it should be good enough in most cases.

Like 17b2d5ec75, backpatch down to v13.

Author: Bryan Green <dbryan.green@gmail.com>
Co-authored-by: Xuneng Zhou <xunengzhou@gmail.com>
Discussion: https://postgr.es/m/9d00b597-d64a-4f1e-802e-90f9dc394c70@gmail.com
Backpatch-through: 13
This commit is contained in:
Michael Paquier
2025-11-05 16:48:19 +09:00
parent 5509055d69
commit a4fd971c6f

View File

@@ -54,8 +54,19 @@ $node_standby_2->append_conf(
'postgresql.conf', qq( 'postgresql.conf', qq(
primary_conninfo='$connstr_1' primary_conninfo='$connstr_1'
)); ));
# Rotate logfile before restarting, for the log checks done below.
$node_standby_2->rotate_logfile;
$node_standby_2->restart; $node_standby_2->restart;
# Wait for walreceiver to reconnect after the restart. We want to
# verify that after reconnection, the walreceiver stays alive during
# the timeline switch.
$node_standby_2->poll_query_until('postgres',
"SELECT EXISTS(SELECT 1 FROM pg_stat_wal_receiver)");
my $wr_pid_before_switch = $node_standby_2->safe_psql('postgres',
"SELECT pid FROM pg_stat_wal_receiver");
# Insert some data in standby 1 and check its presence in standby 2 # Insert some data in standby 1 and check its presence in standby 2
# to ensure that the timeline switch has been done. # to ensure that the timeline switch has been done.
$node_standby_1->safe_psql('postgres', $node_standby_1->safe_psql('postgres',
@@ -75,6 +86,14 @@ ok( !$node_standby_2->log_contains(
), ),
'WAL receiver should not be stopped across timeline jumps'); 'WAL receiver should not be stopped across timeline jumps');
# Verify that the walreceiver process stayed alive across the timeline
# switch, check its PID.
my $wr_pid_after_switch = $node_standby_2->safe_psql('postgres',
"SELECT pid FROM pg_stat_wal_receiver");
is($wr_pid_before_switch, $wr_pid_after_switch,
'WAL receiver PID matches across timeline jumps');
# Ensure that a standby is able to follow a primary on a newer timeline # Ensure that a standby is able to follow a primary on a newer timeline
# when WAL archiving is enabled. # when WAL archiving is enabled.