From cb988b04d0637f5d55bfb5a01b23dd4c39beccfa Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <msawada@postgresql.org>
Date: Wed, 16 Oct 2024 12:07:52 -0700
Subject: [PATCH] Reduce memory block size for decoded tuple storage to 8kB.

Commit a4ccc1cef introduced the Generation Context and modified the
logical decoding process to use a Generation Context with a fixed
block size of 8MB for storing tuple data decoded during logical
decoding (i.e., rb->tup_context). Several reports have indicated that
the logical decoding process can be terminated due to
out-of-memory (OOM) situations caused by excessive memory usage in
rb->tup_context.

This issue can occur when decoding a workload involving several
concurrent transactions, including a long-running transaction that
modifies tuples. By design, the Generation Context does not free a
memory block until all chunks within that block are
released. Consequently, if tuples modified by the long-running
transaction are stored across multiple memory blocks, these blocks
remain allocated until the long-running transaction completes, leading
to substantial memory fragmentation. The memory usage during logical
decoding, tracked by rb->size, does not account for memory
fragmentation, resulting in potentially much higher memory consumption
than the value of the logical_decoding_work_mem parameter.

Various improvement strategies were discussed in the relevant
thread. This change reduces the block size of the Generation Context
used in rb->tup_context from 8MB to 8kB. This modification
significantly decreases the likelihood of substantial memory
fragmentation occurring and is relatively straightforward to
backport. Performance testing across multiple platforms has confirmed
that this change will not introduce any performance degradation that
would impact actual operation.

Backport to all supported branches.

Reported-by: Alex Richman, Michael Guissine, Avi Weinberg
Reviewed-by: Amit Kapila, Fujii Masao, David Rowley
Tested-by: Hayato Kuroda, Shlok Kyal
Discussion: https://postgr.es/m/CAD21AoBTY1LATZUmvSXEssvq07qDZufV4AF-OHh9VD2pC0VY2A%40mail.gmail.com
Backpatch-through: 12
---
 src/backend/replication/logical/reorderbuffer.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index fb323a80ec9..c7f8fa6216c 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -300,9 +300,17 @@ ReorderBufferAllocate(void)
 											SLAB_DEFAULT_BLOCK_SIZE,
 											sizeof(ReorderBufferTXN));
 
+	/*
+	 * To minimize memory fragmentation caused by long-running transactions
+	 * with changes spanning multiple memory blocks, we use a single
+	 * fixed-size memory block for decoded tuple storage. The performance
+	 * testing showed that the default memory block size maintains logical
+	 * decoding performance without causing fragmentation due to concurrent
+	 * transactions.
+	 */
 	buffer->tup_context = GenerationContextCreate(new_ctx,
 												  "Tuples",
-												  SLAB_LARGE_BLOCK_SIZE);
+												  SLAB_DEFAULT_BLOCK_SIZE);
 
 	hash_ctl.keysize = sizeof(TransactionId);
 	hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt);