Commit cbc127917e introduced tracking of unpruned relids to avoid
processing pruned relations, and changed ExecInitModifyTable() to
initialize only unpruned result relations. As a result, MERGE
statements that prune all target partitions can now lead to crashes
or incorrect behavior during execution.
The crash occurs because some executor code paths rely on
ModifyTableState.resultRelInfo[0] being present and initialized,
even when no result relations remain after pruning. For example,
ExecMerge() and ExecMergeNotMatched() use the first resultRelInfo
to determine the appropriate action. Similarly,
ExecInitPartitionInfo() assumes that at least one result relation
exists.
To preserve these assumptions, ExecInitModifyTable() now includes the
first result relation in the initialized result relation list if all
result relations for that ModifyTable were pruned. To enable that,
ExecDoInitialPruning() ensures the first relation is locked if it was
pruned and locking is necessary.
To support this exception to the pruning logic, PlannedStmt now
includes a list of RT indexes identifying the first result relation
of each ModifyTable node in the plan. This allows
ExecDoInitialPruning() to check whether each such relation was
pruned and, if so, lock it if necessary.
Bug: #18830
Reported-by: Robins Tharakan <tharakan@gmail.com>
Diagnozed-by: Tender Wang <tndrwang@gmail.com>
Diagnozed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/18830-1f31ea1dc930d444%40postgresql.org
Before executing a cached generic plan, AcquireExecutorLocks() in
plancache.c locks all relations in a plan's range table to ensure the
plan is safe for execution. However, this locks runtime-prunable
relations that will later be pruned during "initial" runtime pruning,
introducing unnecessary overhead.
This commit defers locking for such relations to executor startup and
ensures that if the CachedPlan is invalidated due to concurrent DDL
during this window, replanning is triggered. Deferring these locks
avoids unnecessary locking overhead for pruned partitions, resulting
in significant speedup, particularly when many partitions are pruned
during initial runtime pruning.
* Changes to locking when executing generic plans:
AcquireExecutorLocks() now locks only unprunable relations, that is,
those found in PlannedStmt.unprunableRelids (introduced in commit
cbc127917e), to avoid locking runtime-prunable partitions
unnecessarily. The remaining locks are taken by
ExecDoInitialPruning(), which acquires them only for partitions that
survive pruning.
This deferral does not affect the locks required for permission
checking in InitPlan(), which takes place before initial pruning.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks, none of which can be in the
set of runtime-prunable relations, are properly locked.
* Plan invalidation handling:
Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle
invalidation caused by deferred locking. If invalidation occurs,
ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the updated
plan. To ensure all code paths that may be affected by this handle
invalidation properly, all callers of ExecutorStart that may execute a
PlannedStmt from a CachedPlan have been updated to use
ExecutorStartCachedPlan() instead.
UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new
CachedPlan.stmt_context, created as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and its statement list. This ensures that loops over
statements in upstream callers of ExecutorStartCachedPlan() remain
intact.
ExecutorStart() and ExecutorStart_hook implementations now return a
boolean value indicating whether plan initialization succeeded with a
valid PlanState tree in QueryDesc.planstate, or false otherwise, in
which case QueryDesc.planstate is NULL. Hook implementations are
required to call standard_ExecutorStart() at the beginning, and if it
returns false, they should do the same without proceeding.
* Testing:
To verify these changes, the delay_execution module tests scenarios
where cached plans become invalid due to changes in prunable relations
after deferred locks.
* Note to extension authors:
ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(), as explained earlier. For example:
if (prev_ExecutorStart)
plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (!plan_valid)
return false;
<extension-code>
return true;
Extensions accessing child relations, especially prunable partitions,
via ExecGetRangeTableRelation() must now ensure their RT indexes are
present in es_unpruned_relids (introduced in commit cbc127917e), or
they will encounter an error. This is a strict requirement after this
change, as only relations in that set are locked.
The idea of deferring some locks to executor startup, allowing locks
for prunable partitions to be skipped, was first proposed by Tom Lane.
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
This adds a new variant of generated columns that are computed on read
(like a view, unlike the existing stored generated columns, which are
computed on write, like a materialized view).
The syntax for the column definition is
... GENERATED ALWAYS AS (...) VIRTUAL
and VIRTUAL is also optional. VIRTUAL is the default rather than
STORED to match various other SQL products. (The SQL standard makes
no specification about this, but it also doesn't know about VIRTUAL or
STORED.) (Also, virtual views are the default, rather than
materialized views.)
Virtual generated columns are stored in tuples as null values. (A
very early version of this patch had the ambition to not store them at
all. But so much stuff breaks or gets confused if you have tuples
where a column in the middle is completely missing. This is a
compromise, and it still saves space over being forced to use stored
generated columns. If we ever find a way to improve this, a bit of
pg_upgrade cleverness could allow for upgrades to a newer scheme.)
The capabilities and restrictions of virtual generated columns are
mostly the same as for stored generated columns. In some cases, this
patch keeps virtual generated columns more restricted than they might
technically need to be, to keep the two kinds consistent. Some of
that could maybe be relaxed later after separate careful
considerations.
Some functionality that is currently not supported, but could possibly
be added as incremental features, some easier than others:
- index on or using a virtual column
- hence also no unique constraints on virtual columns
- extended statistics on virtual columns
- foreign-key constraints on virtual columns
- not-null constraints on virtual columns (check constraints are supported)
- ALTER TABLE / DROP EXPRESSION
- virtual column cannot have domain type
- virtual columns are not supported in logical replication
The tests in generated_virtual.sql have been copied over from
generated_stored.sql with the keyword replaced. This way we can make
sure the behavior is mostly aligned, and the differences can be
visible. Some tests for currently not supported features are
currently commented out.
Reviewed-by: Jian He <jian.universality@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Shlok Kyal <shlok.kyal.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b0a6@eisentraut.org
This commit introduces changes to track unpruned relations explicitly,
making it possible for top-level plan nodes, such as ModifyTable and
LockRows, to avoid processing partitions pruned during initial
pruning. Scan-level nodes, such as Append and MergeAppend, already
avoid the unnecessary processing by accessing partition pruning
results directly via part_prune_index. In contrast, top-level nodes
cannot access pruning results directly and need to determine which
partitions remain unpruned.
To address this, this commit introduces a new bitmapset field,
es_unpruned_relids, which the executor uses to track the set of
unpruned relations. This field is referenced during plan
initialization to skip initializing certain nodes for pruned
partitions. It is initialized with PlannedStmt.unprunableRelids,
a new field that the planner populates with RT indexes of relations
that cannot be pruned during runtime pruning. These include relations
not subject to partition pruning and those required for execution
regardless of pruning.
PlannedStmt.unprunableRelids is computed during set_plan_refs() by
removing the RT indexes of runtime-prunable relations, identified
from PartitionPruneInfos, from the full set of relation RT indexes.
ExecDoInitialPruning() then updates es_unpruned_relids by adding
partitions that survive initial pruning.
To support this, PartitionedRelPruneInfo and PartitionedRelPruningData
now include a leafpart_rti_map[] array that maps partition indexes to
their corresponding RT indexes. The former is used in set_plan_refs()
when constructing unprunableRelids, while the latter is used in
ExecDoInitialPruning() to convert partition indexes returned by
get_matching_partitions() into RT indexes, which are then added to
es_unpruned_relids.
These changes make it possible for ModifyTable and LockRows nodes to
process only relations that remain unpruned after initial pruning.
ExecInitModifyTable() trims lists, such as resultRelations,
withCheckOptionLists, returningLists, and updateColnosLists, to
consider only unpruned partitions. It also creates ResultRelInfo
structs only for these partitions. Similarly, child RowMarks for
pruned relations are skipped.
By avoiding unnecessary initialization of structures for pruned
partitions, these changes improve the performance of updates and
deletes on partitioned tables during initial runtime pruning.
Due to ExecInitModifyTable() changes as described above, EXPLAIN on a
plan for UPDATE and DELETE that uses runtime initial pruning no longer
lists partitions pruned during initial pruning.
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
This commit builds on the prior change that moved PartitionPruneInfos
out of individual plan nodes into a list in PlannedStmt, making it
possible to initialize PartitionPruneStates without traversing the
plan tree and perform runtime initial pruning before ExecInitNode()
initializes the plan trees. These tasks are now handled in a new
routine, ExecDoInitialPruning(), which is called by InitPlan()
before calling ExecInitNode() on various plan trees.
ExecDoInitialPruning() performs the initial pruning and saves the
result -- a Bitmapset of indexes for surviving child subnodes -- in
es_part_prune_results, a list in EState.
PartitionPruneStates created for initial pruning are stored in
es_part_prune_states, another list in EState, for later use during
exec pruning. Both lists are parallel to es_part_prune_infos, which
holds the PartitionPruneInfos from PlannedStmt, enabling shared
indexing.
PartitionPruneStates initialized in ExecDoInitialPruning() now
include only the PartitionPruneContexts for initial pruning steps.
Exec pruning contexts are initialized later in
ExecInitPartitionExecPruning() when the parent plan node is
initialized, as the exec pruning step expressions depend on the parent
node's PlanState.
The existing function PartitionPruneFixSubPlanMap() has been
repurposed for this initialization to avoid duplicating a similar
loop structure for finding PartitionedRelPruningData to initialize
exec pruning contexts for. It has been renamed to
InitExecPruningContexts() to reflect its new primary responsibility.
The original logic to "fix subplan maps" remains intact but is now
encapsulated within the renamed function.
This commit removes two obsolete Asserts in partkey_datum_from_expr().
The ExprContext used for pruning expression evaluation is now
independent of the parent PlanState, making these Asserts unnecessary.
By centralizing pruning logic and decoupling it from the plan
initialization step (ExecInitNode()), this change sets the stage for
future patches that will use the result of initial pruning to
save the overhead of redundant processing for pruned partitions.
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
This moves PartitionPruneInfo from plan nodes to PlannedStmt,
simplifying traversal by centralizing all PartitionPruneInfo
structures in a single list in it, which holds all instances for the
main query and its subqueries. Instead of plan nodes (Append or
MergeAppend) storing PartitionPruneInfo pointers, they now reference
an index in this list.
A bitmapset field is added to PartitionPruneInfo to store the RT
indexes corresponding to the apprelids field in Append or MergeAppend.
This allows execution pruning logic to verify that it operates on the
correct plan node, mainly to facilitate debugging.
Duplicated code in set_append_references() and
set_mergeappend_references() is refactored into a new function,
register_pruneinfo(). This updates RT indexes by applying rtoffet
and adds PartitionPruneInfo to the global list in PlannerGlobal.
By allowing pruning to be performed without traversing the plan tree,
this change lays the groundwork for runtime initial pruning to occur
independently of plan tree initialization.
Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> (earlier version)
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
This allows the RETURNING list of INSERT/UPDATE/DELETE/MERGE queries
to explicitly return old and new values by using the special aliases
"old" and "new", which are automatically added to the query (if not
already defined) while parsing its RETURNING list, allowing things
like:
RETURNING old.colname, new.colname, ...
RETURNING old.*, new.*
Additionally, a new syntax is supported, allowing the names "old" and
"new" to be changed to user-supplied alias names, e.g.:
RETURNING WITH (OLD AS o, NEW AS n) o.colname, n.colname, ...
This is useful when the names "old" and "new" are already defined,
such as inside trigger functions, allowing backwards compatibility to
be maintained -- the interpretation of any existing queries that
happen to already refer to relations called "old" or "new", or use
those as aliases for other relations, is not changed.
For an INSERT, old values will generally be NULL, and for a DELETE,
new values will generally be NULL, but that may change for an INSERT
with an ON CONFLICT ... DO UPDATE clause, or if a query rewrite rule
changes the command type. Therefore, we put no restrictions on the use
of old and new in any DML queries.
Dean Rasheed, reviewed by Jian He and Jeff Davis.
Discussion: https://postgr.es/m/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50AgOe+pvisQ@mail.gmail.com
Robert Haas expressed concern about whether commit 3eea7a0c9 exposed
the parallel-execution machinery to a case it isn't tested for, namely
a second non-parallel execution of a plan after a parallel execution.
Investigation shows that that can't happen because of pquery.c's
manipulation of the scan direction, but it sure wasn't obvious to
start with. Add some commentary about that.
Discussion: https://postgr.es/m/CA+TgmoagyKQy=HFw+wLo0AKTYWHui+iKswZ8Jnqqd-cFby-WVg@mail.gmail.com
Our parallel-mode code only works when we are executing a query
in full, so ExecutePlan must disable parallel mode when it is
asked to do partial execution. The previous logic for this
involved passing down a flag (variously named execute_once or
run_once) from callers of ExecutorRun or PortalRun. This is
overcomplicated, and unsurprisingly some of the callers didn't
get it right, since it requires keeping state that not all of
them have handy; not to mention that the requirements for it were
undocumented. That led to assertion failures in some corner
cases. The only state we really need for this is the existing
QueryDesc.already_executed flag, so let's just put all the
responsibility in ExecutePlan. (It could have been done in
ExecutorRun too, leading to a slightly shorter patch -- but if
there's ever more than one caller of ExecutePlan, it seems better
to have this logic in the subroutine than the callers.)
This makes those ExecutorRun/PortalRun parameters unnecessary.
In master it seems okay to just remove them, returning the
API for those functions to what it was before parallelism.
Such an API break is clearly not okay in stable branches,
but for them we can just leave the parameters in place after
documenting that they do nothing.
Per report from Yugo Nagata, who also reviewed and tested
this patch. Back-patch to all supported branches.
Discussion: https://postgr.es/m/20241206062549.710dc01cf91224809dd6c0e1@sraoss.co.jp
Two attributes are added to pg_stat_database:
* parallel_workers_to_launch, counting the total number of parallel
workers that were planned to be launched.
* parallel_workers_launched, counting the total number of parallel
workers actually launched.
The ratio of both fields can provide hints that there are not enough
slots available when launching parallel workers, also useful when
pg_stat_statements is not deployed on an instance (i.e. cf54a2c002).
This commit relies on de3a2ea3b2, that has added two fields to EState,
that get incremented when executing Gather or GatherMerge nodes.
A test is added in select_parallel, where parallel workers are spawned.
Bump catalog version.
Author: Benoit Lobréau
Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com
This assertion has been added by 24f5205948, but Alexander Lakhin has
proved that the ExecutorRun() one can be broken by using a PL function
that manipulates compute_query_id and track_activities, while the ones
in ExecutorFinish() and ExecutorEnd() could be triggered when cleaning
up portals at the beginning of a new query execution.
Discussion: https://postgr.es/m/b37d8e6c-e83d-e157-8865-1b2460a6aef2@gmail.com
As formulated, the assertion added in the executor by 24f5205948 to
check that a query ID is set had two problems:
- track_activities may be disabled while compute_query_id is enabled,
causing the query ID to not be reported to pg_stat_activity.
- debug_query_string may not be set in some context. The only path
where this would matter is visibly autovacuum, should parallel workers
be enabled there at some point. This is not the case currently.
There was no test showing the interactions between the query ID and
track_activities, so let's add one based on a scan of pg_stat_activity.
This assertion is still an experimentation at this stage, but let's see
if this shows more paths where query IDs are not properly set while they
should.
Discussion: https://postgr.es/m/Zvn5616oYXmpXyHI@paquier.xyz
The previous commit fixed some ways of losing an inplace update. It
remained possible to lose one when a backend working toward a
heap_update() copied a tuple into memory just before inplace update of
that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE
to govern admission to the steps of copying an old tuple, modifying it,
and issuing heap_update(). This includes MERGE commands. To avoid
changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when
holding a relation lock sufficient to exclude inplace updaters.
Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE
pg_class" or "UPDATE pg_database" can still lose an inplace update. The
v14+ UPDATE fix needs commit 86dc90056d,
and it wasn't worth reimplementing that fix without such infrastructure.
Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas.
Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com
This commit adds three sanity checks in code paths of the executor where
it is possible to use hooks, checking that a query ID is reported in
pg_stat_activity if compute_query_id is enabled:
- ExecutorRun()
- ExecutorFinish()
- ExecutorEnd()
This causes the test in pg_stat_statements added in 933848d16d to
complain immediately in ExecutorRun(). The idea behind this commit is
to help extensions to detect if they are missing query ID reports when a
query goes through the executor. Perhaps this will prove to be a bad
idea, but let's see where this experience goes in v18 and newer
versions.
Reviewed-by: Sami Imseih
Discussion: https://postgr.es/m/ZuJb5xCKHH0A9tMN@paquier.xyz
This commit adds query ID reports for two code paths when processing
extended query protocol messages:
- When receiving a bind message, setting it to the first Query retrieved
from a cached cache.
- When receiving an execute message, setting it to the first PlannedStmt
stored in a portal.
An advantage of this method is that this is able to cover all the types
of portals handled in the extended query protocol, particularly these
two when the report done in ExecutorStart() is not enough (neither is an
addition in ExecutorRun(), actually, for the second point):
- Multiple execute messages, with multiple ExecutorRun().
- Portal with execute/fetch messages, like a query with a RETURNING
clause and a fetch size that stores the tuples in a first execute
message going though ExecutorStart() and ExecuteRun(), followed by one
or more execute messages doing only fetches from the tuplestore created
in the first message. This corresponds to the case where
execute_is_fetch is set, for example.
Note that the query ID reporting done in ExecutorStart() is still
necessary, as an EXECUTE requires it. Query ID reporting is optimistic
and more calls to pgstat_report_query_id() don't matter as the first
report takes priority except if the report is forced. The comment in
ExecutorStart() is adjusted to reflect better the reality with the
extended query protocol.
The test added in pg_stat_statements is a courtesy of Robert Haas. This
uses psql's \bind metacommand, hence this part is backpatched down to
v16.
Reported-by: Kaido Vaikla, Erik Wienhold
Author: Sami Imseih
Reviewed-by: Jian He, Andrei Lepikhov, Michael Paquier
Discussion: https://postgr.es/m/CA+427g8DiW3aZ6pOpVgkPbqK97ouBdf18VLiHFesea2jUk3XoQ@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZxtnf_jZ=VqBSyaU8hfUkkwoJCJ6ufy4LGpXaunKrjrg@mail.gmail.com
Discussion: https://postgr.es/m/1391613709.939460.1684777418070@office.mailbox.org
Backpatch-through: 14
This patch provides the additional logging information in the following
conflict scenarios while applying changes:
insert_exists: Inserting a row that violates a NOT DEFERRABLE unique constraint.
update_differ: Updating a row that was previously modified by another origin.
update_exists: The updated row value violates a NOT DEFERRABLE unique constraint.
update_missing: The tuple to be updated is missing.
delete_differ: Deleting a row that was previously modified by another origin.
delete_missing: The tuple to be deleted is missing.
For insert_exists and update_exists conflicts, the log can include the origin
and commit timestamp details of the conflicting key with track_commit_timestamp
enabled.
update_differ and delete_differ conflicts can only be detected when
track_commit_timestamp is enabled on the subscriber.
We do not offer additional logging for exclusion constraint violations because
these constraints can specify rules that are more complex than simple equality
checks. Resolving such conflicts won't be straightforward. This area can be
further enhanced if required.
Author: Hou Zhijie
Reviewed-by: Shveta Malik, Amit Kapila, Nisha Moond, Hayato Kuroda, Dilip Kumar
Discussion: https://postgr.es/m/OS0PR01MB5716352552DFADB8E9AD1D8994C92@OS0PR01MB5716.jpnprd01.prod.outlook.com
This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE
actions, which operate on rows that exist in the target relation, but
not in the data source. These actions can execute UPDATE, DELETE, or
DO NOTHING sub-commands.
This is in contrast to already-supported WHEN NOT MATCHED actions,
which operate on rows that exist in the data source, but not in the
target relation. To make this distinction clearer, such actions may
now be written as WHEN NOT MATCHED BY TARGET.
Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is
equivalent to writing WHEN NOT MATCHED BY TARGET.
Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing.
Discussion: https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0R4NEZUVwP+syRkWbA@mail.gmail.com
The parallel query infrastructure copies the leader backend's active
snapshot to the worker processes. But BitmapHeapScan node also had
bespoken code to pass the snapshot from leader to the worker. That was
redundant, so remove it.
The removed code was analogous to the snapshot serialization in
table_parallelscan_initialize(), but that was the wrong role model. A
parallel bitmap heap scan is more like an independent non-parallel
bitmap heap scan in each parallel worker as far as the table AM is
concerned, because the coordination is done in nodeBitmapHeapscan.c,
and the table AM doesn't need to know anything about it.
This relies on the assumption that es_snapshot ==
GetActiveSnapshot(). That's not a new assumption, things would get
weird if you used the QueryDesc's snapshot for visibility checks in
the scans, but the active snapshot for evaluating quals, for
example. This could use some refactoring and cleanup, but for now,
just add some assertions.
Reviewed-by: Dilip Kumar, Robert Haas
Discussion: https://www.postgresql.org/message-id/5f3b9d59-0f43-419d-80ca-6d04c07cf61a@iki.fi
as determined by include-what-you-use (IWYU)
While IWYU also suggests to *add* a bunch of #include's (which is its
main purpose), this patch does not do that. In some cases, a more
specific #include replaces another less specific one.
Some manual adjustments of the automatic result:
- IWYU currently doesn't know about includes that provide global
variable declarations (like -Wmissing-variable-declarations), so
those includes are being kept manually.
- All includes for port(ability) headers are being kept for now, to
play it safe.
- No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the
patch from exploding in size.
Note that this patch touches just *.c files, so nothing declared in
header files changes in hidden ways.
As a small example, in src/backend/access/transam/rmgr.c, some IWYU
pragma annotations are added to handle a special case there.
Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
This allows the target relation of MERGE to be an auto-updatable or
trigger-updatable view, and includes support for WITH CHECK OPTION,
security barrier views, and security invoker views.
A trigger-updatable view must have INSTEAD OF triggers for every type
of action (INSERT, UPDATE, and DELETE) mentioned in the MERGE command.
An auto-updatable view must not have any INSTEAD OF triggers. Mixing
auto-update and trigger-update actions (i.e., having a partial set of
INSTEAD OF triggers) is not supported.
Rule-updatable views are also not supported, since there is no
rewriter support for non-SELECT rules with MERGE operations.
Dean Rasheed, reviewed by Jian He and Alvaro Herrera.
Discussion: https://postgr.es/m/CAEZATCVcB1g0nmxuEc-A+gGB0HnfcGQNGYH7gS=7rq0u0zOBXA@mail.gmail.com
47bb9db75 modified the ApplyRetrieveRule()'s conversion of a view's
original RTE_RELATION entry into an RTE_SUBQUERY one to retain relid,
rellockmode, and perminfoindex so that the executor can lock the view
and check its permissions. It seems better to also retain
relkind for cross-checking that the exception of an
RTE_SUBQUERY entry being allowed to carry relation details only
applies to views, so do so.
Bump catversion because this changes the output format of
RTE_SUBQUERY RTEs.
Suggested-by: David Steele <david@pgmasters.net>
Reviewed-by: David Steele <david@pgmasters.net>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/3953179e-9540-e5d1-a743-4bef368785b0%40pgmasters.net
The idea of EvalPlanQual is that we replace the query's scan of the
result relation with a single injected tuple, and see if we get a
tuple out, thereby implying that the injected tuple still passes the
query quals. (In join cases, other relations in the query are still
scanned normally.) This logic was not updated when commit 86dc90056
made it possible for a single DML query plan to have multiple result
relations, when the query target relation has inheritance or partition
children. We replaced the output for the current result relation
successfully, but other result relations were still scanned normally;
thus, if any other result relation contained a tuple satisfying the
quals, we'd think the EPQ check passed, even if it did not pass for
the injected tuple itself. This would lead to update or delete
actions getting performed when they should have been skipped due to
a conflicting concurrent update in READ COMMITTED isolation mode.
Fix by blocking all sibling result relations from emitting tuples
during an EvalPlanQual recheck. In the back branches, the fix is
complicated a bit by the need to not change the size of struct
EPQState (else we'd have ABI-breaking changes in offsets in
struct ModifyTableState). Like the back-patches of 3f7836ff6
and 4b3e37993, add a separately palloc'd struct to avoid that.
The logic is the same as in HEAD otherwise.
This is only a live bug back to v14 where 86dc90056 came in.
However, I chose to back-patch the test cases further, on the
grounds that this whole area is none too well tested. I skipped
doing so in v11 though because none of the test applied cleanly,
and it didn't quite seem worth extra work for a branch with only
six months to live.
Per report from Ante Krešić (via Aleksander Alekseev)
Discussion: https://postgr.es/m/CAJ7c6TMBTN3rcz4=AjYhLPD_w3FFT0Wq_C15jxCDn8U4tZnH1g@mail.gmail.com
RI_Initial_Check was setting up a list of RTEPermissionInfo for
ExecCheckPermissions() wrong, and the problem is subtle enough that it
doesn't have any immediate effect in core code. However, if an
extension is using the ExecutorCheckPerms_hook, then it would get the
wrong parameters and perhaps arrive at a wrong conclusion, or outright
malfunction. Fix by constructing that list and the RTE list more
honestly.
We also add an assertion check to verify that these lists match. This
new assertion would have caught this bug.
Co-authored-by: Олег Целебровский (Oleg Tselebrovskii) <o.tselebrovskiy@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/3722b7a2cbe27a1796ee40824bd86dd1@postgrespro.ru
pg_stat_statements relies on EState->es_processed to count the number of
rows processed by ExecutorRun(). This proves to be a problem under the
extended query protocol when the result of a query is fetched through
more than one call of ExecutorRun(), as es_processed is reset each time
ExecutorRun() is called. This causes pg_stat_statements to report the
number of rows calculated in the last execute fetch, rather than the
global sum of all the rows processed.
As pquery.c tells, this is a problem when a portal does not use
holdStore. For example, DMLs with RETURNING would report a correct
tuple count as these do one execution cycle when the query is first
executed to fill in the portal's store with one ExecutorRun(), feeding
on the portal's store for each follow-up execute fetch depending on the
fetch size requested by the client.
The fix proposed for this issue is simple with the addition of an extra
counter in EState that's preserved across multiple ExecutorRun() calls,
incremented with the value calculated in es_processed. This approach is
not back-patchable, unfortunately.
Note that libpq does not currently give any way to control the fetch
size when using the extended v3 protocol, meaning that in-core testing
is not possible yet. This issue can be easily verified with the JDBC
driver, though, with *autocommit disabled*. Hence, having in-core tests
requires more features, left for future discussion:
- At least two new libpq routines splitting PQsendQueryGuts(), one for
the bind/describe and a second for a series of execute fetches with a
custom fetch size, likely in a fashion similar to what JDBC does.
- A psql meta-command for the execute phase. This part is not strictly
mandatory, still it could be handy.
Reported-by: Andrew Dunstan (original discovery by Simon Siggs)
Author: Sami Imseih
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/EBE6C507-9EB6-4142-9E4D-38B1673363A7@amazon.com
Discussion: https://postgr.es/m/c90890e7-9c89-c34f-d3c5-d5c763a34bd8@dunslane.net
This provides a very simple way to see the generic plan for a
parameterized query. Without this, it's necessary to define
a prepared statement and temporarily change plan_cache_mode,
which is a bit tedious.
One thing that's a bit of a hack perhaps is that we disable
execution-time partition pruning when the GENERIC_PLAN option
is given. That's because the pruning code may attempt to
fetch the value of one of the parameters, which would fail.
Laurenz Albe, reviewed by Julien Rouhaud, Christoph Berg,
Michel Pelletier, Jim Jones, and myself
Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel@cybertec.at
If UPDATE is forced to retry after an EvalPlanQual check, it neglected
to repeat GENERATED-column computations, even though those might well
have changed since we're dealing with a different tuple than before.
Fixing this is mostly a matter of looping back a bit further when
we retry. In v15 and HEAD that's most easily done by altering the API
of ExecUpdateAct so that it includes computing GENERATED expressions.
Also, if an UPDATE in a partitioned table turns into a cross-partition
INSERT operation, we failed to recompute GENERATED columns. That's a
bug since 8bf6ec3ba allowed partitions to have different generation
expressions; although it seems to have no ill effects before that.
Fixing this is messier because we can now have situations where the same
query needs both the UPDATE-aligned set of GENERATED columns and the
INSERT-aligned set, and it's unclear which set will be generated first
(else we could hack things by forcing the INSERT-aligned set to be
generated, which is indeed how fe9e658f4 made it work for MERGE).
The best fix seems to be to build and store separate sets of expressions
for the INSERT and UPDATE cases. That would create ABI issues in the
back branches, but so far it seems we can leave this alone in the back
branches.
Per bug #17823 from Hisahiro Kauchi. The first part of this affects all
branches back to v12 where GENERATED columns were added.
Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org
While testing a fix for bug #17823, I discovered that EvalPlanQualStart
failed to copy es_rteperminfos from the parent EState, resulting in
failure if anything in EPQ execution wanted to consult that information.
This led me to conclude that commit a61b1f748 had been too haphazard
about where to fill es_rteperminfos, and that we need to be sure that
that happens exactly where es_range_table gets filled. So I changed the
signature of ExecInitRangeTable to help ensure that this new requirement
doesn't get missed. (Indeed, pgoutput.c was also failing to fill it.
Maybe we don't ever need it there, but I wouldn't bet on that.)
No test case yet; one will arrive with the fix for #17823.
But that needs to be back-patched, while this fix is HEAD-only.
Discussion: https://postgr.es/m/17823-b64909cf7d63de84@postgresql.org
We were identifying the updatable generated columns of inheritance
children by transposing the calculation made for their parent.
However, there's nothing that says a traditional-inheritance child
can't have generated columns that aren't there in its parent, or that
have different dependencies than are in the parent's expression.
(At present it seems that we don't enforce that for partitioning
either, which is likely wrong to some degree or other; but the case
clearly needs to be handled with traditional inheritance.)
Hence, drop the very-klugy-anyway "extraUpdatedCols" RTE field
in favor of identifying which generated columns depend on updated
columns during executor startup. In HEAD we can remove
extraUpdatedCols altogether; in back branches, it's still there but
always empty. Another difference between the HEAD and back-branch
versions of this patch is that in HEAD we can add the new bitmap field
to ResultRelInfo, but that would cause an ABI break in back branches.
Like 4b3e37993, add a List field at the end of struct EState instead.
Back-patch to v13. The bogus calculation is also being made in v12,
but it doesn't have the same visible effect because we don't use it
to decide which generated columns to recalculate; as a consequence of
which the patch doesn't apply easily. I think that there might still
be a demonstrable bug associated with trigger firing conditions, but
that's such a weird corner-case usage that I'm content to leave it
unfixed in v12.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/CA+HiwqFshLKNvQUd1DgwJ-7tsTp=dwv7KZqXC4j2wYBV1aCDUA@mail.gmail.com
Discussion: https://postgr.es/m/2793383.1672944799@sss.pgh.pa.us
In commit ffbb7e65a, I added a ModifyTableState member to ResultRelInfo
to save the owning ModifyTableState for use by nodeModifyTable.c when
performing batch inserts, but as pointed out by Tom Lane, that changed
the array stride of es_result_relations, and that would break any
previously-compiled extension code that accesses that array. Fix by
removing that member from ResultRelInfo and instead adding a List member
at the end of EState to save such ModifyTableStates.
Per report from Tom Lane. Back-patch to v14, like the previous commit;
I chose to apply the patch to HEAD as well, to make back-patching easy.
Discussion: http://postgr.es/m/4065383.1669395453%40sss.pgh.pa.us
Currently, information about the permissions to be checked on relations
mentioned in a query is stored in their range table entries. So the
executor must scan the entire range table looking for relations that
need to have permissions checked. This can make the permission checking
part of the executor initialization needlessly expensive when many
inheritance children are present in the range range. While the
permissions need not be checked on the individual child relations, the
executor still must visit every range table entry to filter them out.
This commit moves the permission checking information out of the range
table entries into a new plan node called RTEPermissionInfo. Every
top-level (inheritance "root") RTE_RELATION entry in the range table
gets one and a list of those is maintained alongside the range table.
This new list is initialized by the parser when initializing the range
table. The rewriter can add more entries to it as rules/views are
expanded. Finally, the planner combines the lists of the individual
subqueries into one flat list that is passed to the executor for
checking.
To make it quick to find the RTEPermissionInfo entry belonging to a
given relation, RangeTblEntry gets a new Index field 'perminfoindex'
that stores the corresponding RTEPermissionInfo's index in the query's
list of the latter.
ExecutorCheckPerms_hook has gained another List * argument; the
signature is now:
typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable,
List *rtePermInfos,
bool ereport_on_violation);
The first argument is no longer used by any in-core uses of the hook,
but we leave it in place because there may be other implementations that
do. Implementations should likely scan the rtePermInfos list to
determine which operations to allow or deny.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
ri_RootToPartitionMap is currently only initialized for tuple routing
target partitions, though a future commit will need the ability to use
it even for the non-partition child tables, so make adjustments to the
decouple it from the partitioning code.
Also, make it lazily initialized via ExecGetRootToChildMap(), making
that function its preferred access path. Existing third-party code
accessing it directly should no longer do so; consequently, it's been
renamed to ri_RootToChildMap, which also makes it consistent with
ri_ChildToRootMap.
ExecGetRootToChildMap() houses the logic of setting the map appropriately
depending on whether a given child relation is partition or not.
To support this, also add a separate entry point for TupleConversionMap
creation that receives an AttrMap. No new code here, just split an
existing function in two.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqEYUhDXSK5BTvG_xk=eaAEJCD4GS3C6uH7ybBvv+Z_Tmg@mail.gmail.com
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
A future commit will move the checkAsUser field from RangeTblEntry
to a new node that, unlike RTEs, will only be created for tables
mentioned in the query but not for the inheritance child relations
added to the query by the planner. So, checkAsUser value for a
given child relation will have to be obtained by referring to that
for its ancestor mentioned in the query.
In preparation, it seems better to expand the use of RelOptInfo.userid
during planning in place of rte->checkAsUser so that there will be
fewer places to adjust for the above change.
Given that the child-to-ancestor mapping is not available during the
execution of a given "child" ForeignScan node, add a checkAsUser
field to ForeignScan to carry the child relation's RelOptInfo.userid.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqGFCs2uq7VRKi7g+FFKbP6Ea_2_HkgZb2HPhUfaAKT3ng@mail.gmail.com
When it's given as true, return a 0 in the position of the missing
column rather than raising an error.
This is currently unused, but it allows us to reimplement column
permission checking in a subsequent commit. It seems worth breaking
into a separate commit because it affects unrelated code.
Author: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFfiai=qBxPDTjaio_ZcaqUKh+FC=prESrB8ogZgFNNNQ@mail.gmail.com
Commit b663a4136, which allowed FDWs to INSERT rows in bulk, added to
nodeModifyTable.c code to flush pending inserts to the foreign-table
result relation(s) before completing processing of the ModifyTable node,
but the code failed to take into account the case where the INSERT query
has modifying CTEs, leading to incorrect results.
Also, that commit failed to flush pending inserts before firing BEFORE
ROW triggers so that rows are visible to such triggers.
In that commit we scanned through EState's
es_tuple_routing_result_relations or es_opened_result_relations list to
find the foreign-table result relations to which pending inserts are
flushed, but that would be inefficient in some cases. So to fix, 1) add
a List member to EState to record the insert-pending result relations,
and 2) modify nodeModifyTable.c so that it adds the foreign-table result
relation to the list in ExecInsert() if appropriate, and flushes pending
inserts properly using the list where needed.
While here, fix a copy-and-pasteo in a comment in ExecBatchInsert(),
which was added by that commit.
Back-patch to v14 where that commit appeared.
Discussion: https://postgr.es/m/CAPmGK16qutyCmyJJzgQOhfBq%3DNoGDqTB6O0QBZTihrbqre%2BoxA%40mail.gmail.com
The API contract for planstate_tree_walker() callbacks is that they
take a PlanState pointer and a context pointer. Somebody figured
they could save a couple lines of code by ignoring that, and passing
ExecShutdownNode itself as the walker even though it has but one
argument. Somewhat remarkably, we've gotten away with that so far.
However, it seems clear that the upcoming C2x standard means to
forbid such cases, and compilers that actively break such code
likely won't be far behind. So spend the extra few lines of code
to do it honestly with a separate walker function.
In HEAD, we might as well go further and remove ExecShutdownNode's
useless return value. I left that as-is in back branches though,
to forestall complaints about ABI breakage.
Back-patch, with the thought that this might become of practical
importance before our stable branches are all out of service.
It doesn't seem to be fixing any live bug on any currently known
platform, however.
Discussion: https://postgr.es/m/208054.1663534665@sss.pgh.pa.us
MERGE performs actions that modify rows in the target table using a
source table or query. MERGE provides a single SQL statement that can
conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise
require multiple PL statements. For example,
MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
DO NOTHING;
MERGE works with regular tables, partitioned tables and inheritance
hierarchies, including column and row security enforcement, as well as
support for row and statement triggers and transition tables therein.
MERGE is optimized for OLTP and is parameterizable, though also useful
for large scale ETL/ELT. MERGE is not intended to be used in preference
to existing single SQL commands for INSERT, UPDATE or DELETE since there
is some overhead. MERGE can be used from PL/pgSQL.
MERGE does not support targetting updatable views or foreign tables, and
RETURNING clauses are not allowed either. These limitations are likely
fixable with sufficient effort. Rewrite rules are also not supported,
but it's not clear that we'd want to support them.
Author: Pavan Deolasee <pavan.deolasee@gmail.com>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions)
Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions)
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
When an update on a partitioned table referenced in foreign key
constraints causes a row to move from one partition to another,
the fact that the move is implemented as a delete followed by an insert
on the target partition causes the foreign key triggers to have
surprising behavior. For example, a given foreign key's delete trigger
which implements the ON DELETE CASCADE clause of that key will delete
any referencing rows when triggered for that internal DELETE, although
it should not, because the referenced row is simply being moved from one
partition of the referenced root partitioned table into another, not
being deleted from it.
This commit teaches trigger.c to skip queuing such delete trigger events
on the leaf partitions in favor of an UPDATE event fired on the root
target relation. Doing so is sensible because both the old and the new
tuple "logically" belong to the root relation.
The after trigger event queuing interface now allows passing the source
and the target partitions of a particular cross-partition update when
registering the update event for the root partitioned table. Along with
the two ctids of the old and the new tuple, the after trigger event now
also stores the OIDs of those partitions. The tuples fetched from the
source and the target partitions are converted into the root table
format, if necessary, before they are passed to the trigger function.
The implementation currently has a limitation that only the foreign keys
pointing into the query's target relation are considered, not those of
its sub-partitioned partitions. That seems like a reasonable
limitation, because it sounds rare to have distinct foreign keys
pointing to sub-partitioned partitions instead of to the root table.
This misbehavior stems from commit f56f8f8da6 (which added support for
foreign keys to reference partitioned tables) not paying sufficient
attention to commit 2f17844104 (which had introduced cross-partition
updates a year earlier). Even though the former commit goes back to
Postgres 12, we're not backpatching this fix at this time for fear of
destabilizing things too much, and because there are a few ABI breaks in
it that we'd have to work around in older branches. It also depends on
commit f4566345cf, which had its own share of backpatchability issues
as well.
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Eduard Català <eduard.catala@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFvkBCmfwkQX_yBqv2Wz8ugUGiBDxum8=WvVbfU1TXaNg@mail.gmail.com
Discussion: https://postgr.es/m/CAL54xNZsLwEM1XCk5yW9EqaRzsZYHuWsHQkA2L5MOSKXAwviCQ@mail.gmail.com