1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-09 06:21:09 +03:00

Preserve conflict-relevant data during logical replication.

Logical replication requires reliable conflict detection to maintain data
consistency across nodes. To achieve this, we must prevent premature
removal of tuples deleted by other origins and their associated commit_ts
data by VACUUM, which could otherwise lead to incorrect conflict reporting
and resolution.

This patch introduces a mechanism to retain deleted tuples on the
subscriber during the application of concurrent transactions from remote
nodes. Retaining these tuples allows us to correctly ignore concurrent
updates to the same tuple. Without this, an UPDATE might be misinterpreted
as an INSERT during resolutions due to the absence of the original tuple.

Additionally, we ensure that origin metadata is not prematurely removed by
vacuum freeze, which is essential for detecting update_origin_differs and
delete_origin_differs conflicts.

To support this, a new replication slot named pg_conflict_detection is
created and maintained by the launcher on the subscriber. Each apply
worker tracks its own non-removable transaction ID, which the launcher
aggregates to determine the appropriate xmin for the slot, thereby
retaining necessary tuples.

Conflict information retention (deleted tuples and commit_ts) can be
enabled per subscription via the retain_conflict_info option. This is
disabled by default to avoid unnecessary overhead for configurations that
do not require conflict resolution or logging.

During upgrades, if any subscription on the old cluster has
retain_conflict_info enabled, a conflict detection slot will be created to
protect relevant tuples from deletion when the new cluster starts.

This is a foundational work to correctly detect update_deleted conflict
which will be done in a follow-up patch.

Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
This commit is contained in:
Amit Kapila
2025-07-23 02:56:00 +00:00
parent 039f7ee0fe
commit 228c370868
45 changed files with 2234 additions and 221 deletions

View File

@@ -1183,7 +1183,11 @@ EndPrepare(GlobalTransaction gxact)
* starting immediately after the WAL record is inserted could complete
* without fsync'ing our state file. (This is essentially the same kind
* of race condition as the COMMIT-to-clog-write case that
* RecordTransactionCommit uses DELAY_CHKPT_START for; see notes there.)
* RecordTransactionCommit uses DELAY_CHKPT_IN_COMMIT for; see notes
* there.) Note that DELAY_CHKPT_IN_COMMIT is used to find transactions in
* the critical commit section. We need to know about such transactions
* for conflict detection in logical replication. See
* GetOldestActiveTransactionId(true, false) and its use.
*
* We save the PREPARE record's location in the gxact for later use by
* CheckPointTwoPhase.
@@ -2298,7 +2302,7 @@ ProcessTwoPhaseBuffer(FullTransactionId fxid,
* RecordTransactionCommitPrepared
*
* This is basically the same as RecordTransactionCommit (q.v. if you change
* this function): in particular, we must set DELAY_CHKPT_START to avoid a
* this function): in particular, we must set DELAY_CHKPT_IN_COMMIT to avoid a
* race condition.
*
* We know the transaction made at least one XLOG entry (its PREPARE),
@@ -2318,7 +2322,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
const char *gid)
{
XLogRecPtr recptr;
TimestampTz committs = GetCurrentTimestamp();
TimestampTz committs;
bool replorigin;
/*
@@ -2331,8 +2335,24 @@ RecordTransactionCommitPrepared(TransactionId xid,
START_CRIT_SECTION();
/* See notes in RecordTransactionCommit */
Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0);
MyProc->delayChkptFlags |= DELAY_CHKPT_START;
Assert((MyProc->delayChkptFlags & DELAY_CHKPT_IN_COMMIT) == 0);
MyProc->delayChkptFlags |= DELAY_CHKPT_IN_COMMIT;
/*
* Ensures the DELAY_CHKPT_IN_COMMIT flag write is globally visible before
* commit time is written.
*/
pg_write_barrier();
/*
* Note it is important to set committs value after marking ourselves as
* in the commit critical section (DELAY_CHKPT_IN_COMMIT). This is because
* we want to ensure all transactions that have acquired commit timestamp
* are finished before we allow the logical replication client to advance
* its xid which is used to hold back dead rows for conflict detection.
* See comments atop worker.c.
*/
committs = GetCurrentTimestamp();
/*
* Emit the XLOG commit record. Note that we mark 2PC commits as
@@ -2381,7 +2401,7 @@ RecordTransactionCommitPrepared(TransactionId xid,
TransactionIdCommitTree(xid, nchildren, children);
/* Checkpoint can proceed now */
MyProc->delayChkptFlags &= ~DELAY_CHKPT_START;
MyProc->delayChkptFlags &= ~DELAY_CHKPT_IN_COMMIT;
END_CRIT_SECTION();