1
0
mirror of https://github.com/postgres/postgres.git synced 2025-11-19 13:42:17 +03:00

Implement Self-Join Elimination

The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result.  Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures.  Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses.  Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.

This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.

  SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
  SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
  SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;

In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.

  SELECT * FROM t1 WHERE x IN
    (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);

Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.

The SJE proof is based on innerrel_is_unique() machinery.

We can remove a self-join when for each outer row:

 1. At most, one inner row matches the join clause;
 2. Each matched inner row must be (physically) the same as the outer one;
 3. Inner and outer rows have the same row mark.

In this patch, we use the next approach to identify a self-join:

 1. Collect all merge-joinable join quals which look like a.x = b.x;
 2. Add to the list above the baseretrictinfo of the inner table;
 3. Check innerrel_is_unique() for the qual list.  If it returns false, skip
    this pair of joining tables;
 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
    possibility of self-join elimination, the inner and outer clauses must
    match exactly.

The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins.  Tests covering this feature
were added to join.sql.  Some of the existing regression tests changed due
to self-join removal logic.

Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
This commit is contained in:
Alexander Korotkov
2025-02-13 00:56:03 +02:00
parent 3fb58625d1
commit fc069a3a63
19 changed files with 2983 additions and 148 deletions

View File

@@ -852,7 +852,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) exprs,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
PVC_INCLUDE_PLACEHOLDERS);
PVC_INCLUDE_PLACEHOLDERS |
PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc, ec->ec_members)
{

View File

@@ -4162,6 +4162,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
{
return relation_has_unique_index_ext(root, rel, restrictlist,
exprlist, oprlist, NULL);
}
/*
* relation_has_unique_index_ext
* Same as relation_has_unique_index_for(), but supports extra_clauses
* parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
* which were used to derive uniqueness.
*/
bool
relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist,
List **extra_clauses)
{
ListCell *ic;
@@ -4217,6 +4233,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@@ -4268,6 +4285,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
{
MemoryContext oldMemCtx =
MemoryContextSwitchTo(root->planner_cxt);
/*
* Add filter clause into a list allowing caller to
* know if uniqueness have made not only by join
* clauses.
*/
Assert(bms_is_empty(rinfo->left_relids) ||
bms_is_empty(rinfo->right_relids));
if (extra_clauses)
exprs = lappend(exprs, rinfo);
MemoryContextSwitchTo(oldMemCtx);
}
break;
}
}
@@ -4310,7 +4345,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
{
if (extra_clauses)
*extra_clauses = exprs;
return true;
}
}
return false;

File diff suppressed because it is too large Load Diff

View File

@@ -233,6 +233,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
/*
* Remove self joins on a unique column.
*/
joinlist = remove_useless_self_joins(root, joinlist);
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a

View File

@@ -639,14 +639,17 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
* using the subquery's original targetlist expressions, not the
* using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
* generate_append_tlist, and those would confuse estimate_num_groups
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
* be an RTE corresponding to each setop's output. Note, we use this not
* subquery's targetlist but subroot->parse's targetlist, because it was
* revised by self-join removal. subquery's targetlist might contain the
* references to the removed relids.
*/
if (pNumGroups)
{
@@ -659,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
get_tlist_exprs(subquery->targetList, false),
get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);