From cb988b04d0637f5d55bfb5a01b23dd4c39beccfa Mon Sep 17 00:00:00 2001 From: Masahiko Sawada Date: Wed, 16 Oct 2024 12:07:52 -0700 Subject: [PATCH] Reduce memory block size for decoded tuple storage to 8kB. Commit a4ccc1cef introduced the Generation Context and modified the logical decoding process to use a Generation Context with a fixed block size of 8MB for storing tuple data decoded during logical decoding (i.e., rb->tup_context). Several reports have indicated that the logical decoding process can be terminated due to out-of-memory (OOM) situations caused by excessive memory usage in rb->tup_context. This issue can occur when decoding a workload involving several concurrent transactions, including a long-running transaction that modifies tuples. By design, the Generation Context does not free a memory block until all chunks within that block are released. Consequently, if tuples modified by the long-running transaction are stored across multiple memory blocks, these blocks remain allocated until the long-running transaction completes, leading to substantial memory fragmentation. The memory usage during logical decoding, tracked by rb->size, does not account for memory fragmentation, resulting in potentially much higher memory consumption than the value of the logical_decoding_work_mem parameter. Various improvement strategies were discussed in the relevant thread. This change reduces the block size of the Generation Context used in rb->tup_context from 8MB to 8kB. This modification significantly decreases the likelihood of substantial memory fragmentation occurring and is relatively straightforward to backport. Performance testing across multiple platforms has confirmed that this change will not introduce any performance degradation that would impact actual operation. Backport to all supported branches. Reported-by: Alex Richman, Michael Guissine, Avi Weinberg Reviewed-by: Amit Kapila, Fujii Masao, David Rowley Tested-by: Hayato Kuroda, Shlok Kyal Discussion: https://postgr.es/m/CAD21AoBTY1LATZUmvSXEssvq07qDZufV4AF-OHh9VD2pC0VY2A%40mail.gmail.com Backpatch-through: 12 --- src/backend/replication/logical/reorderbuffer.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c index fb323a80ec9..c7f8fa6216c 100644 --- a/src/backend/replication/logical/reorderbuffer.c +++ b/src/backend/replication/logical/reorderbuffer.c @@ -300,9 +300,17 @@ ReorderBufferAllocate(void) SLAB_DEFAULT_BLOCK_SIZE, sizeof(ReorderBufferTXN)); + /* + * To minimize memory fragmentation caused by long-running transactions + * with changes spanning multiple memory blocks, we use a single + * fixed-size memory block for decoded tuple storage. The performance + * testing showed that the default memory block size maintains logical + * decoding performance without causing fragmentation due to concurrent + * transactions. + */ buffer->tup_context = GenerationContextCreate(new_ctx, "Tuples", - SLAB_LARGE_BLOCK_SIZE); + SLAB_DEFAULT_BLOCK_SIZE); hash_ctl.keysize = sizeof(TransactionId); hash_ctl.entrysize = sizeof(ReorderBufferTXNByIdEnt);