1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-08 11:42:09 +03:00

Fix recovery_prefetch with low maintenance_io_concurrency.

We should process completed IOs *before* trying to start more, so that
it is always possible to decode one more record when the decoded record
queue is empty, even if maintenance_io_concurrency is set so low that a
single earlier WAL record might have saturated the IO queue.

That bug was hidden because the effect of maintenance_io_concurrency was
arbitrarily clamped to be at least 2.  Fix the ordering, and also remove
that clamp.  We need a special case for 0, which is now treated the same
as recovery_prefetch=off, but otherwise the number is used directly.
This allows for testing with 1, which would have made the problem
obvious in simple test scenarios.

Also add an explicit error message for missing contrecords.  It was a
bit strange that we didn't report an error already, and became a latent
bug with prefetching, since the internal state that tracks aborted
contrecords would not survive retrying, as revealed by
026_overwrite_contrecord.pl with this adjustment.  Reporting an error
prevents that.

Back-patch to 15.

Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220831140128.GS31833%40telsasoft.com
This commit is contained in:
Thomas Munro
2022-09-08 20:25:20 +12:00
parent 12d40d4a8d
commit adb466150b
3 changed files with 56 additions and 23 deletions

View File

@ -275,22 +275,24 @@ XLogBeginRead(XLogReaderState *state, XLogRecPtr RecPtr)
}
/*
* See if we can release the last record that was returned by
* XLogNextRecord(), if any, to free up space.
* Release the last record that was returned by XLogNextRecord(), if any, to
* free up space. Returns the LSN past the end of the record.
*/
void
XLogRecPtr
XLogReleasePreviousRecord(XLogReaderState *state)
{
DecodedXLogRecord *record;
XLogRecPtr next_lsn;
if (!state->record)
return;
return InvalidXLogRecPtr;
/*
* Remove it from the decoded record queue. It must be the oldest item
* decoded, decode_queue_head.
*/
record = state->record;
next_lsn = record->next_lsn;
Assert(record == state->decode_queue_head);
state->record = NULL;
state->decode_queue_head = record->next;
@ -336,6 +338,8 @@ XLogReleasePreviousRecord(XLogReaderState *state)
state->decode_buffer_tail = state->decode_buffer;
}
}
return next_lsn;
}
/*
@ -907,6 +911,17 @@ err:
*/
state->abortedRecPtr = RecPtr;
state->missingContrecPtr = targetPagePtr;
/*
* If we got here without reporting an error, report one now so that
* XLogPrefetcherReadRecord() doesn't bring us back a second time and
* clobber the above state. Otherwise, the existing error takes
* precedence.
*/
if (!state->errormsg_buf[0])
report_invalid_record(state,
"missing contrecord at %X/%X",
LSN_FORMAT_ARGS(RecPtr));
}
if (decoded && decoded->oversized)