From 15f0cb26b530b6725a37391738cfc62d4745c49b Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Tue, 8 Apr 2025 02:41:03 -0400 Subject: [PATCH] Increase BAS_BULKREAD based on effective_io_concurrency Before, BAS_BULKREAD was always of size 256kB. With the default io_combine_limit of 16, that only allowed 1-2 IOs to be in flight - insufficient even on very low latency storage. We don't just want to increase the size to a much larger hardcoded value, as very large rings (10s of MBs of of buffers), appear to have negative performance effects when reading in data that the OS has cached (but not when actually needing to do IO). To address this, increase the size of BAS_BULKREAD to allow for io_combine_limit * effective_io_concurrency buffers getting read in. To prevent the ring being much larger than useful, limit the increased size with GetPinLimit(). The formula outlined above keeps the ring size to sizes for which we have not observed performance regressions, unless very large effective_io_concurrency values are used together with large shared_buffers setting. Reviewed-by: Thomas Munro Discussion: https://postgr.es/m/lqwghabtu2ak4wknzycufqjm5ijnxhb4k73vzphlt2a3wsemcd@gtftg44kdim6 Discussion: https://postgr.es/m/uvrtrknj4kdytuboidbhwclo4gxhswwcpgadptsjvjqcluzmah@brqs62irg4dt --- src/backend/storage/buffer/freelist.c | 46 +++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c index 336715b6c63..e1f8e5e97bd 100644 --- a/src/backend/storage/buffer/freelist.c +++ b/src/backend/storage/buffer/freelist.c @@ -555,8 +555,50 @@ GetAccessStrategy(BufferAccessStrategyType btype) return NULL; case BAS_BULKREAD: - ring_size_kb = 256; - break; + { + int ring_max_kb; + + /* + * The ring always needs to be large enough to allow some + * separation in time between providing a buffer to the user + * of the strategy and that buffer being reused. Otherwise the + * user's pin will prevent reuse of the buffer, even without + * concurrent activity. + * + * We also need to ensure the ring always is large enough for + * SYNC_SCAN_REPORT_INTERVAL, as noted above. + * + * Thus we start out a minimal size and increase the size + * further if appropriate. + */ + ring_size_kb = 256; + + /* + * There's no point in a larger ring if we won't be allowed to + * pin sufficiently many buffers. But we never limit to less + * than the minimal size above. + */ + ring_max_kb = GetPinLimit() * (BLCKSZ / 1024); + ring_max_kb = Max(ring_size_kb, ring_max_kb); + + /* + * We would like the ring to additionally have space for the + * the configured degree of IO concurrency. While being read + * in, buffers can obviously not yet be reused. + * + * Each IO can be up to io_combine_limit blocks large, and we + * want to start up to effective_io_concurrency IOs. + * + * Note that effective_io_concurrency may be 0, which disables + * AIO. + */ + ring_size_kb += (BLCKSZ / 1024) * + io_combine_limit * effective_io_concurrency; + + if (ring_size_kb > ring_max_kb) + ring_size_kb = ring_max_kb; + break; + } case BAS_BULKWRITE: ring_size_kb = 16 * 1024; break;