Prevent references to invalid relation pages after fresh promotion

If a standby crashes after promotion before having completed its first post-recovery checkpoint, then the minimal recovery point which marks the LSN position where the cluster is able to reach consistency may be set to a position older than the first end-of-recovery checkpoint while all the WAL available should be replayed. This leads to the instance thinking that it contains inconsistent pages, causing a PANIC and a hard instance crash even if all the WAL available has not been replayed for certain sets of records replayed. When in crash recovery, minRecoveryPoint is expected to always be set to InvalidXLogRecPtr, which forces the recovery to replay all the WAL available, so this commit makes sure that the local copy of minRecoveryPoint from the control file is initialized properly and stays as it is while crash recovery is performed. Once switching to archive recovery or if crash recovery finishes, then the local copy minRecoveryPoint can be safely updated. Pavan Deolasee has reported and diagnosed the failure in the first place, and the base fix idea to rely on the local copy of minRecoveryPoint comes from Kyotaro Horiguchi, which has been expanded into a full-fledged patch by me. The test included in this commit has been written by Álvaro Herrera and Pavan Deolasee, which I have modified to make it faster and more reliable with sleep phases. Backpatch down to all supported versions where the bug appears, aka 9.3 which is where the end-of-recovery checkpoint is not run by the startup process anymore. The test gets easily supported down to 10, still it has been tested on all branches. Reported-by: Pavan Deolasee Diagnosed-by: Pavan Deolasee Reviewed-by: Pavan Deolasee, Kyotaro Horiguchi Author: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, Álvaro Herrera Discussion: https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com
2025-07-18 17:42:25 +03:00 · 2018-07-05 10:46:18 +09:00
parent 249126e761
commit 3c64dcb1e3
2 changed files with 157 additions and 31 deletions
--- a/src/test/recovery/t/015_promotion_pages.pl
+++ b/src/test/recovery/t/015_promotion_pages.pl
@ -0,0 +1,87 @@
+# Test for promotion handling with WAL records generated post-promotion
+# before the first checkpoint is generated.  This test case checks for
+# invalid page references at replay based on the minimum consistent
+# recovery point defined.
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib;
+use Test::More tests => 1;
+
+# Initialize primary node
+my $alpha = get_new_node('alpha');
+$alpha->init(allows_streaming => 1);
+# Setting wal_log_hints to off is important to get invalid page
+# references.
+$alpha->append_conf("postgresql.conf", <<EOF);
+wal_log_hints = off
+EOF
+
+# Start the primary
+$alpha->start;
+
+# setup/start a standby
+$alpha->backup('bkp');
+my $bravo = get_new_node('bravo');
+$bravo->init_from_backup($alpha, 'bkp', has_streaming => 1);
+$bravo->append_conf('postgresql.conf', <<EOF);
+checkpoint_timeout=1h
+checkpoint_completion_target=0.9
+EOF
+$bravo->start;
+
+# Dummy table for the upcoming tests.
+$alpha->safe_psql('postgres', 'create table test1 (a int)');
+$alpha->safe_psql('postgres', 'insert into test1 select generate_series(1, 10000)');
+
+# take a checkpoint
+$alpha->safe_psql('postgres', 'checkpoint');
+
+# The following vacuum will set visibility map bits and create
+# problematic WAL records.
+$alpha->safe_psql('postgres', 'vacuum verbose test1');
+# Wait for last record to have been replayed on the standby.
+$alpha->wait_for_catchup($bravo, 'replay',
+						 $alpha->lsn('insert'));
+
+# Now force a checkpoint on the standby. This seems unnecessary but for "some"
+# reason, the previous checkpoint on the primary does not reflect on the standby
+# and without an explicit checkpoint, it may start redo recovery from a much
+# older point, which includes even create table and initial page additions.
+$bravo->safe_psql('postgres', 'checkpoint');
+
+# Now just use a dummy table and run some operations to move minRecoveryPoint
+# beyond the previous vacuum.
+$alpha->safe_psql('postgres', 'create table test2 (a int, b text)');
+$alpha->safe_psql('postgres', 'insert into test2 select generate_series(1,10000), md5(random()::text)');
+$alpha->safe_psql('postgres', 'truncate test2');
+
+# Wait again for all records to be replayed.
+$alpha->wait_for_catchup($bravo, 'replay',
+						 $alpha->lsn('insert'));
+
+# Do the promotion, which reinitializes minRecoveryPoint in the control
+# file so as WAL is replayed up to the end.
+$bravo->promote;
+
+# Truncate the table on the promoted standby, vacuum and extend it
+# again to create new page references.  The first post-recovery checkpoint
+# has not happened yet.
+$bravo->safe_psql('postgres', 'truncate test1');
+$bravo->safe_psql('postgres', 'vacuum verbose test1');
+$bravo->safe_psql('postgres', 'insert into test1 select generate_series(1,1000)');
+
+# Now crash-stop the promoted standby and restart.  This makes sure that
+# replay does not see invalid page references because of an invalid
+# minimum consistent recovery point.
+$bravo->stop('immediate');
+$bravo->start;
+
+# Check state of the table after full crash recovery.  All its data should
+# be here.
+my $psql_out;
+$bravo->psql(
+	'postgres',
+	"SELECT count(*) FROM test1",
+	stdout => \$psql_out);
+is($psql_out, '1000', "Check that table state is correct");