Allocate consecutive blocks during parallel seqscans

Previously we would allocate blocks to parallel workers during a parallel sequential scan 1 block at a time. Since other workers were likely to request a block before a worker returns for another block number to work on, this could lead to non-sequential I/O patterns in each worker which could cause the operating system's readahead to perform poorly or not at all. Here we change things so that we allocate consecutive "chunks" of blocks to workers and have them work on those until they're done, at which time we allocate another chunk for the worker. The size of these chunks is based on the size of the relation. Initial patch here was by Thomas Munro which showed some good improvements just having a fixed chunk size of 64 blocks with a simple ramp-down near the end of the scan. The revisions of the patch to make the chunk size based on the relation size and the adjusted ramp-down in powers of two was done by me, along with quite extensive benchmarking to determine the optimal chunk sizes. For the most part, benchmarks have shown significant performance improvements for large parallel sequential scans on Linux, FreeBSD and Windows using SSDs. It's less clear how this affects the performance of cloud providers. Tests done so far are unable to obtain stable enough performance to provide meaningful benchmark results. It is possible that this could cause some performance regressions on more obscure filesystems, so we may need to later provide users with some ability to get something closer to the old behavior. For now, let's leave that until we see that it's really required. Author: Thomas Munro, David Rowley Reviewed-by: Ranier Vilela, Soumyadeep Chakraborty, Robert Haas Reviewed-by: Amit Kapila, Kirk Jamison Discussion: https://postgr.es/m/CA+hUKGJ_EErDv41YycXcbMbCBkztA34+z1ts9VQH+ACRuvpxig@mail.gmail.com
2025-10-21 02:52:47 +03:00 · 2020-07-26 21:02:45 +12:00
parent 11a68e4b53
commit 56788d2156
4 changed files with 144 additions and 12 deletions
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -42,9 +42,9 @@ typedef struct TableScanDescData
 	 */
 	uint32		rs_flags;

+	void	   *rs_private;		/* per-worker private memory for AM to use */
 	struct ParallelTableScanDescData *rs_parallel;	/* parallel scan
 													 * information */
-
 } TableScanDescData;
 typedef struct TableScanDescData *TableScanDesc;

@@ -81,6 +81,18 @@ typedef struct ParallelBlockTableScanDescData
 }			ParallelBlockTableScanDescData;
 typedef struct ParallelBlockTableScanDescData *ParallelBlockTableScanDesc;

+/*
+ * Per backend state for parallel table scan, for block-oriented storage.
+ */
+typedef struct ParallelBlockTableScanWorkerData
+{
+	uint64		phsw_nallocated;	/* Current # of blocks into the scan */
+	uint32		phsw_chunk_remaining;	/* # blocks left in this chunk */
+	uint32		phsw_chunk_size;	/* The number of blocks to allocate in
+									 * each I/O chunk for the scan */
+}			ParallelBlockTableScanWorkerData;
+typedef struct ParallelBlockTableScanWorkerData *ParallelBlockTableScanWorker;
+
 /*
 * Base class for fetches from a table via an index. This is the base-class
 * for such scans, which needs to be embedded in the respective struct for
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -1793,8 +1793,10 @@ extern Size table_block_parallelscan_initialize(Relation rel,
 extern void table_block_parallelscan_reinitialize(Relation rel,
 												  ParallelTableScanDesc pscan);
 extern BlockNumber table_block_parallelscan_nextpage(Relation rel,
+													 ParallelBlockTableScanWorker pbscanwork,
 													 ParallelBlockTableScanDesc pbscan);
 extern void table_block_parallelscan_startblock_init(Relation rel,
+													 ParallelBlockTableScanWorker pbscanwork,
 													 ParallelBlockTableScanDesc pbscan);