Fix SQL-spec incompatibilities in new transition table feature.

The standard says that all changes of the same kind (insert, update, or delete) caused in one table by a single SQL statement should be reported in a single transition table; and by that, they mean to include foreign key enforcement actions cascading from the statement's direct effects. It's also reasonable to conclude that if the standard had wCTEs, they would say that effects of wCTEs applying to the same table as each other or the outer statement should be merged into one transition table. We weren't doing it like that. Hence, arrange to merge tuples from multiple update actions into a single transition table as much as we can. There is a problem, which is that if the firing of FK enforcement triggers and after-row triggers with transition tables is interspersed, we might need to report more tuples after some triggers have already seen the transition table. It seems like a bad idea for the transition table to be mutable between trigger calls. There's no good way around this without a major redesign of the FK logic, so for now, resolve it by opening a new transition table each time this happens. Also, ensure that AFTER STATEMENT triggers fire just once per statement, or once per transition table when we're forced to make more than one. Previous versions of Postgres have allowed each FK enforcement query to cause an additional firing of the AFTER STATEMENT triggers for the referencing table, but that's certainly not per spec. (We're still doing multiple firings of BEFORE STATEMENT triggers, though; is that something worth changing?) Also, forbid using transition tables with column-specific UPDATE triggers. The spec requires such transition tables to show only the tuples for which the UPDATE trigger would have fired, which means maintaining multiple transition tables or else somehow filtering the contents at readout. Maybe someday we'll bother to support that option, but it looks like a lot of trouble for a marginal feature. The transition tables are now managed by the AfterTriggers data structures, rather than being directly the responsibility of ModifyTable nodes. This removes a subtransaction-lifespan memory leak introduced by my previous band-aid patch 3c4359521. In passing, refactor the AfterTriggers data structures to reduce the management overhead for them, by using arrays of structs rather than several parallel arrays for per-query-level and per-subtransaction state. I failed to resist the temptation to do some copy-editing on the SGML docs about triggers, above and beyond merely documenting the effects of this patch. Back-patch to v10, because we don't want the semantics of transition tables to change post-release. Patch by me, with help and review from Thomas Munro. Discussion: https://postgr.es/m/20170909064853.25630.12825@wrigleys.postgresql.org
2025-11-16 15:02:33 +03:00 · 2017-09-16 13:20:32 -04:00
parent d2bbd61040
commit 54d4d0ff6c
11 changed files with 804 additions and 391 deletions
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2429,12 +2429,17 @@ CopyFrom(CopyState cstate)
 	/* Triggers might need a slot as well */
 	estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);

+	/* Prepare to catch AFTER triggers. */
+	AfterTriggerBeginQuery();
+
 	/*
 	 * If there are any triggers with transition tables on the named relation,
 	 * we need to be prepared to capture transition tuples.
 	 */
 	cstate->transition_capture =
-		MakeTransitionCaptureState(cstate->rel->trigdesc);
+		MakeTransitionCaptureState(cstate->rel->trigdesc,
+								   RelationGetRelid(cstate->rel),
+								   CMD_INSERT);

 	/*
 	 * If the named relation is a partitioned table, initialize state for
@@ -2510,9 +2515,6 @@ CopyFrom(CopyState cstate)
 		bufferedTuples = palloc(MAX_BUFFERED_TUPLES * sizeof(HeapTuple));
 	}

-	/* Prepare to catch AFTER triggers. */
-	AfterTriggerBeginQuery();
-
 	/*
 	 * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
 	 * should do this for COPY, since it's not really an "INSERT" statement as
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -241,11 +241,11 @@ This is a sketch of control flow for full query processing:
 		CreateExecutorState
 			creates per-query context
 		switch to per-query context to run ExecInitNode
+		AfterTriggerBeginQuery
 		ExecInitNode --- recursively scans plan tree
 			CreateExprContext
 				creates per-tuple context
 			ExecInitExpr
-		AfterTriggerBeginQuery

 	ExecutorRun
 		ExecProcNode --- recursively called in per-query context
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -251,11 +251,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	estate->es_top_eflags = eflags;
 	estate->es_instrument = queryDesc->instrument_options;

-	/*
-	 * Initialize the plan state tree
-	 */
-	InitPlan(queryDesc, eflags);
-
 	/*
 	 * Set up an AFTER-trigger statement context, unless told not to, or
 	 * unless it's EXPLAIN-only mode (when ExecutorFinish won't be called).
@@ -263,6 +258,11 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
 	if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
 		AfterTriggerBeginQuery();

+	/*
+	 * Initialize the plan state tree
+	 */
+	InitPlan(queryDesc, eflags);
+
 	MemoryContextSwitchTo(oldcontext);
 }

@@ -1174,6 +1174,7 @@ CheckValidResultRel(ResultRelInfo *resultRelInfo, CmdType operation)
 			switch (operation)
 			{
 				case CMD_INSERT:
+
 					/*
 					 * If foreign partition to do tuple-routing for, skip the
 					 * check; it's disallowed elsewhere.
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -342,6 +342,9 @@ ExecInsert(ModifyTableState *mtstate,
 				mtstate->mt_transition_capture->tcs_map = NULL;
 			}
 		}
+		if (mtstate->mt_oc_transition_capture != NULL)
+			mtstate->mt_oc_transition_capture->tcs_map =
+				mtstate->mt_transition_tupconv_maps[leaf_part_index];

 		/*
 		 * We might need to convert from the parent rowtype to the partition
@@ -1157,6 +1160,8 @@ lreplace:;
 	/* AFTER ROW UPDATE Triggers */
 	ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, tuple,
 						 recheckIndexes,
+						 mtstate->operation == CMD_INSERT ?
+						 mtstate->mt_oc_transition_capture :
 						 mtstate->mt_transition_capture);

 	list_free(recheckIndexes);
@@ -1443,7 +1448,7 @@ fireASTriggers(ModifyTableState *node)
 			if (node->mt_onconflict == ONCONFLICT_UPDATE)
 				ExecASUpdateTriggers(node->ps.state,
 									 resultRelInfo,
-									 node->mt_transition_capture);
+									 node->mt_oc_transition_capture);
 			ExecASInsertTriggers(node->ps.state, resultRelInfo,
 								 node->mt_transition_capture);
 			break;
@@ -1473,14 +1478,24 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)

 	/* Check for transition tables on the directly targeted relation. */
 	mtstate->mt_transition_capture =
-		MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc);
+		MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
+								   RelationGetRelid(targetRelInfo->ri_RelationDesc),
+								   mtstate->operation);
+	if (mtstate->operation == CMD_INSERT &&
+		mtstate->mt_onconflict == ONCONFLICT_UPDATE)
+		mtstate->mt_oc_transition_capture =
+			MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
+									   RelationGetRelid(targetRelInfo->ri_RelationDesc),
+									   CMD_UPDATE);

 	/*
 	 * If we found that we need to collect transition tuples then we may also
 	 * need tuple conversion maps for any children that have TupleDescs that
-	 * aren't compatible with the tuplestores.
+	 * aren't compatible with the tuplestores.  (We can share these maps
+	 * between the regular and ON CONFLICT cases.)
 	 */
-	if (mtstate->mt_transition_capture != NULL)
+	if (mtstate->mt_transition_capture != NULL ||
+		mtstate->mt_oc_transition_capture != NULL)
 	{
 		ResultRelInfo *resultRelInfos;
 		int			numResultRelInfos;
@@ -1521,10 +1536,12 @@ ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate)
 		/*
 		 * Install the conversion map for the first plan for UPDATE and DELETE
 		 * operations.  It will be advanced each time we switch to the next
-		 * plan.  (INSERT operations set it every time.)
+		 * plan.  (INSERT operations set it every time, so we need not update
+		 * mtstate->mt_oc_transition_capture here.)
 		 */
-		mtstate->mt_transition_capture->tcs_map =
-			mtstate->mt_transition_tupconv_maps[0];
+		if (mtstate->mt_transition_capture)
+			mtstate->mt_transition_capture->tcs_map =
+				mtstate->mt_transition_tupconv_maps[0];
 	}
 }

@@ -1628,13 +1645,19 @@ ExecModifyTable(PlanState *pstate)
 				estate->es_result_relation_info = resultRelInfo;
 				EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
 									node->mt_arowmarks[node->mt_whichplan]);
+				/* Prepare to convert transition tuples from this child. */
 				if (node->mt_transition_capture != NULL)
 				{
-					/* Prepare to convert transition tuples from this child. */
 					Assert(node->mt_transition_tupconv_maps != NULL);
 					node->mt_transition_capture->tcs_map =
 						node->mt_transition_tupconv_maps[node->mt_whichplan];
 				}
+				if (node->mt_oc_transition_capture != NULL)
+				{
+					Assert(node->mt_transition_tupconv_maps != NULL);
+					node->mt_oc_transition_capture->tcs_map =
+						node->mt_transition_tupconv_maps[node->mt_whichplan];
+				}
 				continue;
 			}
 			else
@@ -1933,8 +1956,12 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		mtstate->mt_partition_tuple_slot = partition_tuple_slot;
 	}

-	/* Build state for collecting transition tuples */
-	ExecSetupTransitionCaptureState(mtstate, estate);
+	/*
+	 * Build state for collecting transition tuples.  This requires having a
+	 * valid trigger query context, so skip it in explain-only mode.
+	 */
+	if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
+		ExecSetupTransitionCaptureState(mtstate, estate);

 	/*
 	 * Initialize any WITH CHECK OPTION constraints if needed.
@@ -2317,16 +2344,6 @@ ExecEndModifyTable(ModifyTableState *node)
 {
 	int			i;

-	/*
-	 * Free transition tables, unless this query is being run in
-	 * EXEC_FLAG_SKIP_TRIGGERS mode, which means that it may have queued AFTER
-	 * triggers that won't be run till later.  In that case we'll just leak
-	 * the transition tables till end of (sub)transaction.
-	 */
-	if (node->mt_transition_capture != NULL &&
-		!(node->ps.state->es_top_eflags & EXEC_FLAG_SKIP_TRIGGERS))
-		DestroyTransitionCaptureState(node->mt_transition_capture);
-
 	/*
 	 * Allow any FDWs to shut down
 	 */
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -43,13 +43,21 @@ typedef struct TriggerData

 /*
 * The state for capturing old and new tuples into transition tables for a
- * single ModifyTable node.
+ * single ModifyTable node (or other operation source, e.g. copy.c).
+ *
+ * This is per-caller to avoid conflicts in setting tcs_map or
+ * tcs_original_insert_tuple.  Note, however, that the pointed-to
+ * private data may be shared across multiple callers.
 */
+struct AfterTriggersTableData;	/* private in trigger.c */
+
 typedef struct TransitionCaptureState
 {
 	/*
 	 * Is there at least one trigger specifying each transition relation on
 	 * the relation explicitly named in the DML statement or COPY command?
+	 * Note: in current usage, these flags could be part of the private state,
+	 * but it seems possibly useful to let callers see them.
 	 */
 	bool		tcs_delete_old_table;
 	bool		tcs_update_old_table;
@@ -60,7 +68,7 @@ typedef struct TransitionCaptureState
 	 * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
 	 * new and old tuples from a child table's format to the format of the
 	 * relation named in a query so that it is compatible with the transition
-	 * tuplestores.
+	 * tuplestores.  The caller must store the conversion map here if so.
 	 */
 	TupleConversionMap *tcs_map;

@@ -74,17 +82,9 @@ typedef struct TransitionCaptureState
 	HeapTuple	tcs_original_insert_tuple;

 	/*
-	 * The tuplestores backing the transition tables.  We use separate
-	 * tuplestores for INSERT and UPDATE, because INSERT ... ON CONFLICT ...
-	 * DO UPDATE causes INSERT and UPDATE triggers to fire and needs a way to
-	 * keep track of the new tuple images resulting from the two cases
-	 * separately.  We only need a single old image tuplestore, because there
-	 * is no statement that can both update and delete at the same time.
+	 * Private data including the tuplestore(s) into which to insert tuples.
 	 */
-	Tuplestorestate *tcs_old_tuplestore;	/* for DELETE and UPDATE old
-											 * images */
-	Tuplestorestate *tcs_insert_tuplestore; /* for INSERT new images */
-	Tuplestorestate *tcs_update_tuplestore; /* for UPDATE new images */
+	struct AfterTriggersTableData *tcs_private;
 } TransitionCaptureState;

 /*
@@ -174,8 +174,9 @@ extern void RelationBuildTriggers(Relation relation);
 extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc);

 extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc);
-extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc);
-extern void DestroyTransitionCaptureState(TransitionCaptureState *tcs);
+
+extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc,
+						   Oid relid, CmdType cmdType);

 extern void FreeTriggerDesc(TriggerDesc *trigdesc);

--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -983,7 +983,9 @@ typedef struct ModifyTableState
 	/* Per partition tuple conversion map */
 	TupleTableSlot *mt_partition_tuple_slot;
 	struct TransitionCaptureState *mt_transition_capture;
-	/* controls transition table population */
+	/* controls transition table population for specified operation */
+	struct TransitionCaptureState *mt_oc_transition_capture;
+	/* controls transition table population for INSERT...ON CONFLICT UPDATE */
 	TupleConversionMap **mt_transition_tupconv_maps;
 	/* Per plan/partition tuple conversion */
 } ModifyTableState;
--- a/src/test/regress/expected/triggers.out
+++ b/src/test/regress/expected/triggers.out
@@ -2217,6 +2217,23 @@ with wcte as (insert into table1 values (42))
  insert into table2 values ('hello world');
 NOTICE:  trigger = table2_trig, new table = ("hello world")
 NOTICE:  trigger = table1_trig, new table = (42)
+with wcte as (insert into table1 values (43))
+  insert into table1 values (44);
+NOTICE:  trigger = table1_trig, new table = (43), (44)
+select * from table1;
+ a  
+----
+ 42
+ 44
+ 43
+(3 rows)
+
+select * from table2;
+      a      
+-------------
+ hello world
+(1 row)
+
 drop table table1;
 drop table table2;
 --
@@ -2256,6 +2273,14 @@ create trigger my_table_multievent_trig
  after insert or update on my_table referencing new table as new_table
  for each statement execute procedure dump_insert();
 ERROR:  transition tables cannot be specified for triggers with more than one event
+--
+-- Verify that you can't create a trigger with transition tables with
+-- a column list.
+--
+create trigger my_table_col_update_trig
+  after update of b on my_table referencing new table as new_table
+  for each statement execute procedure dump_insert();
+ERROR:  transition tables cannot be specified for triggers with column lists
 drop table my_table;
 --
 -- Test firing of triggers with transition tables by foreign key cascades
@@ -2299,8 +2324,7 @@ select * from trig_table;
 (6 rows)

 delete from refd_table where length(b) = 3;
-NOTICE:  trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b")
-NOTICE:  trigger = trig_table_delete_trig, old table = (11,"one a"), (11,"one b")
+NOTICE:  trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b"), (11,"one a"), (11,"one b")
 select * from trig_table;
 a |    b    
 ---+---------
@@ -2309,6 +2333,30 @@ select * from trig_table;
 (2 rows)

 drop table refd_table, trig_table;
+--
+-- self-referential FKs are even more fun
+--
+create table self_ref (a int primary key,
+                       b int references self_ref(a) on delete cascade);
+create trigger self_ref_r_trig
+  after delete on self_ref referencing old table as old_table
+  for each row execute procedure dump_delete();
+create trigger self_ref_s_trig
+  after delete on self_ref referencing old table as old_table
+  for each statement execute procedure dump_delete();
+insert into self_ref values (1, null), (2, 1), (3, 2);
+delete from self_ref where a = 1;
+NOTICE:  trigger = self_ref_r_trig, old table = (1,), (2,1)
+NOTICE:  trigger = self_ref_r_trig, old table = (1,), (2,1)
+NOTICE:  trigger = self_ref_s_trig, old table = (1,), (2,1)
+NOTICE:  trigger = self_ref_r_trig, old table = (3,2)
+NOTICE:  trigger = self_ref_s_trig, old table = (3,2)
+-- without AR trigger, cascaded deletes all end up in one transition table
+drop trigger self_ref_r_trig on self_ref;
+insert into self_ref values (1, null), (2, 1), (3, 2), (4, 3);
+delete from self_ref where a = 1;
+NOTICE:  trigger = self_ref_s_trig, old table = (1,), (2,1), (3,2), (4,3)
+drop table self_ref;
 -- cleanup
 drop function dump_insert();
 drop function dump_update();
--- a/src/test/regress/sql/triggers.sql
+++ b/src/test/regress/sql/triggers.sql
@@ -1729,6 +1729,12 @@ create trigger table2_trig
 with wcte as (insert into table1 values (42))
  insert into table2 values ('hello world');

+with wcte as (insert into table1 values (43))
+  insert into table1 values (44);
+
+select * from table1;
+select * from table2;
+
 drop table table1;
 drop table table2;

@@ -1769,6 +1775,15 @@ create trigger my_table_multievent_trig
  after insert or update on my_table referencing new table as new_table
  for each statement execute procedure dump_insert();

+--
+-- Verify that you can't create a trigger with transition tables with
+-- a column list.
+--
+
+create trigger my_table_col_update_trig
+  after update of b on my_table referencing new table as new_table
+  for each statement execute procedure dump_insert();
+
 drop table my_table;

 --
@@ -1812,6 +1827,33 @@ select * from trig_table;

 drop table refd_table, trig_table;

+--
+-- self-referential FKs are even more fun
+--
+
+create table self_ref (a int primary key,
+                       b int references self_ref(a) on delete cascade);
+
+create trigger self_ref_r_trig
+  after delete on self_ref referencing old table as old_table
+  for each row execute procedure dump_delete();
+create trigger self_ref_s_trig
+  after delete on self_ref referencing old table as old_table
+  for each statement execute procedure dump_delete();
+
+insert into self_ref values (1, null), (2, 1), (3, 2);
+
+delete from self_ref where a = 1;
+
+-- without AR trigger, cascaded deletes all end up in one transition table
+drop trigger self_ref_r_trig on self_ref;
+
+insert into self_ref values (1, null), (2, 1), (3, 2), (4, 3);
+
+delete from self_ref where a = 1;
+
+drop table self_ref;
+
 -- cleanup
 drop function dump_insert();
 drop function dump_update();