Rework planning and execution of UPDATE and DELETE.

This patch makes two closely related sets of changes: 1. For UPDATE, the subplan of the ModifyTable node now only delivers the new values of the changed columns (i.e., the expressions computed in the query's SET clause) plus row identity information such as CTID. ModifyTable must re-fetch the original tuple to merge in the old values of any unchanged columns. The core advantage of this is that the changed columns are uniform across all tables of an inherited or partitioned target relation, whereas the other columns might not be. A secondary advantage, when the UPDATE involves joins, is that less data needs to pass through the plan tree. The disadvantage of course is an extra fetch of each tuple to be updated. However, that seems to be very nearly free in context; even worst-case tests don't show it to add more than a couple percent to the total query cost. At some point it might be interesting to combine the re-fetch with the tuple access that ModifyTable must do anyway to mark the old tuple dead; but that would require a good deal of refactoring and it seems it wouldn't buy all that much, so this patch doesn't attempt it. 2. For inherited UPDATE/DELETE, instead of generating a separate subplan for each target relation, we now generate a single subplan that is just exactly like a SELECT's plan, then stick ModifyTable on top of that. To let ModifyTable know which target relation a given incoming row refers to, a tableoid junk column is added to the row identity information. This gets rid of the horrid hack that was inheritance_planner(), eliminating O(N^2) planning cost and memory consumption in cases where there were many unprunable target relations. Point 2 of course requires point 1, so that there is a uniform definition of the non-junk columns to be returned by the subplan. We can't insist on uniform definition of the row identity junk columns however, if we want to keep the ability to have both plain and foreign tables in a partitioning hierarchy. Since it wouldn't scale very far to have every child table have its own row identity column, this patch includes provisions to merge similar row identity columns into one column of the subplan result. In particular, we can merge the whole-row Vars typically used as row identity by FDWs into one column by pretending they are type RECORD. (It's still okay for the actual composite Datums to be labeled with the table's rowtype OID, though.) There is more that can be done to file down residual inefficiencies in this patch, but it seems to be committable now. FDW authors should note several API changes: * The argument list for AddForeignUpdateTargets() has changed, and so has the method it must use for adding junk columns to the query. Call add_row_identity_var() instead of manipulating the parse tree directly. You might want to reconsider exactly what you're adding, too. * PlanDirectModify() must now work a little harder to find the ForeignScan plan node; if the foreign table is part of a partitioning hierarchy then the ForeignScan might not be the direct child of ModifyTable. See postgres_fdw for sample code. * To check whether a relation is a target relation, it's no longer sufficient to compare its relid to root->parse->resultRelation. Instead, check it against all_result_relids or leaf_result_relids, as appropriate. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
2025-10-28 11:55:03 +03:00 · 2021-03-31 11:52:34 -04:00
parent 055fee7eb4
commit 86dc90056d
55 changed files with 2353 additions and 2199 deletions
--- a/src/backend/executor/execExpr.c
+++ b/src/backend/executor/execExpr.c
@@ -477,6 +477,206 @@ ExecBuildProjectionInfo(List *targetList,
 	return projInfo;
 }

+/*
+ *		ExecBuildUpdateProjection
+ *
+ * Build a ProjectionInfo node for constructing a new tuple during UPDATE.
+ * The projection will be executed in the given econtext and the result will
+ * be stored into the given tuple slot.  (Caller must have ensured that tuple
+ * slot has a descriptor matching the target rel!)
+ *
+ * subTargetList is the tlist of the subplan node feeding ModifyTable.
+ * We use this mainly to cross-check that the expressions being assigned
+ * are of the correct types.  The values from this tlist are assumed to be
+ * available from the "outer" tuple slot.  They are assigned to target columns
+ * listed in the corresponding targetColnos elements.  (Only non-resjunk tlist
+ * entries are assigned.)  Columns not listed in targetColnos are filled from
+ * the UPDATE's old tuple, which is assumed to be available in the "scan"
+ * tuple slot.
+ *
+ * relDesc must describe the relation we intend to update.
+ *
+ * This is basically a specialized variant of ExecBuildProjectionInfo.
+ * However, it also performs sanity checks equivalent to ExecCheckPlanOutput.
+ * Since we never make a normal tlist equivalent to the whole
+ * tuple-to-be-assigned, there is no convenient way to apply
+ * ExecCheckPlanOutput, so we must do our safety checks here.
+ */
+ProjectionInfo *
+ExecBuildUpdateProjection(List *subTargetList,
+						  List *targetColnos,
+						  TupleDesc relDesc,
+						  ExprContext *econtext,
+						  TupleTableSlot *slot,
+						  PlanState *parent)
+{
+	ProjectionInfo *projInfo = makeNode(ProjectionInfo);
+	ExprState  *state;
+	int			nAssignableCols;
+	bool		sawJunk;
+	Bitmapset  *assignedCols;
+	LastAttnumInfo deform = {0, 0, 0};
+	ExprEvalStep scratch = {0};
+	int			outerattnum;
+	ListCell   *lc,
+			   *lc2;
+
+	projInfo->pi_exprContext = econtext;
+	/* We embed ExprState into ProjectionInfo instead of doing extra palloc */
+	projInfo->pi_state.tag = T_ExprState;
+	state = &projInfo->pi_state;
+	state->expr = NULL;			/* not used */
+	state->parent = parent;
+	state->ext_params = NULL;
+
+	state->resultslot = slot;
+
+	/*
+	 * Examine the subplan tlist to see how many non-junk columns there are,
+	 * and to verify that the non-junk columns come before the junk ones.
+	 */
+	nAssignableCols = 0;
+	sawJunk = false;
+	foreach(lc, subTargetList)
+	{
+		TargetEntry *tle = lfirst_node(TargetEntry, lc);
+
+		if (tle->resjunk)
+			sawJunk = true;
+		else
+		{
+			if (sawJunk)
+				elog(ERROR, "subplan target list is out of order");
+			nAssignableCols++;
+		}
+	}
+
+	/* We should have one targetColnos entry per non-junk column */
+	if (nAssignableCols != list_length(targetColnos))
+		elog(ERROR, "targetColnos does not match subplan target list");
+
+	/*
+	 * Build a bitmapset of the columns in targetColnos.  (We could just use
+	 * list_member_int() tests, but that risks O(N^2) behavior with many
+	 * columns.)
+	 */
+	assignedCols = NULL;
+	foreach(lc, targetColnos)
+	{
+		AttrNumber	targetattnum = lfirst_int(lc);
+
+		assignedCols = bms_add_member(assignedCols, targetattnum);
+	}
+
+	/*
+	 * We want to insert EEOP_*_FETCHSOME steps to ensure the outer and scan
+	 * tuples are sufficiently deconstructed.  Outer tuple is easy, but for
+	 * scan tuple we must find out the last old column we need.
+	 */
+	deform.last_outer = nAssignableCols;
+
+	for (int attnum = relDesc->natts; attnum > 0; attnum--)
+	{
+		Form_pg_attribute attr = TupleDescAttr(relDesc, attnum - 1);
+
+		if (attr->attisdropped)
+			continue;
+		if (bms_is_member(attnum, assignedCols))
+			continue;
+		deform.last_scan = attnum;
+		break;
+	}
+
+	ExecPushExprSlots(state, &deform);
+
+	/*
+	 * Now generate code to fetch data from the outer tuple, incidentally
+	 * validating that it'll be of the right type.  The checks above ensure
+	 * that the forboth() will iterate over exactly the non-junk columns.
+	 */
+	outerattnum = 0;
+	forboth(lc, subTargetList, lc2, targetColnos)
+	{
+		TargetEntry *tle = lfirst_node(TargetEntry, lc);
+		AttrNumber	targetattnum = lfirst_int(lc2);
+		Form_pg_attribute attr;
+
+		Assert(!tle->resjunk);
+
+		/*
+		 * Apply sanity checks comparable to ExecCheckPlanOutput().
+		 */
+		if (targetattnum <= 0 || targetattnum > relDesc->natts)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("table row type and query-specified row type do not match"),
+					 errdetail("Query has too many columns.")));
+		attr = TupleDescAttr(relDesc, targetattnum - 1);
+
+		if (attr->attisdropped)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("table row type and query-specified row type do not match"),
+					 errdetail("Query provides a value for a dropped column at ordinal position %d.",
+							   targetattnum)));
+		if (exprType((Node *) tle->expr) != attr->atttypid)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("table row type and query-specified row type do not match"),
+					 errdetail("Table has type %s at ordinal position %d, but query expects %s.",
+							   format_type_be(attr->atttypid),
+							   targetattnum,
+							   format_type_be(exprType((Node *) tle->expr)))));
+
+		/*
+		 * OK, build an outer-tuple reference.
+		 */
+		scratch.opcode = EEOP_ASSIGN_OUTER_VAR;
+		scratch.d.assign_var.attnum = outerattnum++;
+		scratch.d.assign_var.resultnum = targetattnum - 1;
+		ExprEvalPushStep(state, &scratch);
+	}
+
+	/*
+	 * Now generate code to copy over any old columns that were not assigned
+	 * to, and to ensure that dropped columns are set to NULL.
+	 */
+	for (int attnum = 1; attnum <= relDesc->natts; attnum++)
+	{
+		Form_pg_attribute attr = TupleDescAttr(relDesc, attnum - 1);
+
+		if (attr->attisdropped)
+		{
+			/* Put a null into the ExprState's resvalue/resnull ... */
+			scratch.opcode = EEOP_CONST;
+			scratch.resvalue = &state->resvalue;
+			scratch.resnull = &state->resnull;
+			scratch.d.constval.value = (Datum) 0;
+			scratch.d.constval.isnull = true;
+			ExprEvalPushStep(state, &scratch);
+			/* ... then assign it to the result slot */
+			scratch.opcode = EEOP_ASSIGN_TMP;
+			scratch.d.assign_tmp.resultnum = attnum - 1;
+			ExprEvalPushStep(state, &scratch);
+		}
+		else if (!bms_is_member(attnum, assignedCols))
+		{
+			/* Certainly the right type, so needn't check */
+			scratch.opcode = EEOP_ASSIGN_SCAN_VAR;
+			scratch.d.assign_var.attnum = attnum - 1;
+			scratch.d.assign_var.resultnum = attnum - 1;
+			ExprEvalPushStep(state, &scratch);
+		}
+	}
+
+	scratch.opcode = EEOP_DONE;
+	ExprEvalPushStep(state, &scratch);
+
+	ExecReadyExpr(state);
+
+	return projInfo;
+}
+
 /*
 * ExecPrepareExpr --- initialize for expression execution outside a normal
 * Plan tree context.