1
0
mirror of https://github.com/postgres/postgres.git synced 2025-04-24 10:47:04 +03:00

Implement Self-Join Elimination

The Self-Join Elimination (SJE) feature removes an inner join of a plain
table to itself in the query tree if it is proven that the join can be
replaced with a scan without impacting the query result.  Self-join and
inner relation get replaced with the outer in query, equivalence classes,
and planner info structures.  Also, the inner restrictlist moves to the
outer one with the removal of duplicated clauses.  Thus, this optimization
reduces the length of the range table list (this especially makes sense for
partitioned relations), reduces the number of restriction clauses and,
in turn, selectivity estimations, and potentially improves total planner
prediction for the query.

This feature is dedicated to avoiding redundancy, which can appear after
pull-up transformations or the creation of an EquivalenceClass-derived clause
like the below.

  SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3);
  SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x);
  SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x;

In the future, we could also reduce redundancy caused by subquery pull-up
after unnecessary outer join removal in cases like the one below.

  SELECT * FROM t1 WHERE x IN
    (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x);

Also, it can drastically help to join partitioned tables, removing entries
even before their expansion.

The SJE proof is based on innerrel_is_unique() machinery.

We can remove a self-join when for each outer row:

 1. At most, one inner row matches the join clause;
 2. Each matched inner row must be (physically) the same as the outer one;
 3. Inner and outer rows have the same row mark.

In this patch, we use the next approach to identify a self-join:

 1. Collect all merge-joinable join quals which look like a.x = b.x;
 2. Add to the list above the baseretrictinfo of the inner table;
 3. Check innerrel_is_unique() for the qual list.  If it returns false, skip
    this pair of joining tables;
 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the
    possibility of self-join elimination, the inner and outer clauses must
    match exactly.

The relation replacement procedure is not trivial and is partly combined
with the one used to remove useless left joins.  Tests covering this feature
were added to join.sql.  Some of the existing regression tests changed due
to self-join removal logic.

Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru>
Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>
Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com>
Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Simon Riggs <simon@2ndquadrant.com>
Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org>
Reviewed-by: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com>
Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Hywel Carver <hywel@skillerwhale.com>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Zhihong Yu <zyu@yugabyte.com>
Reviewed-by: Greg Stark <stark@mit.edu>
Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec>
Reviewed-by: Michał Kłeczek <michal@kleczek.org>
Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru>
Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
This commit is contained in:
Alexander Korotkov 2025-02-13 00:56:03 +02:00
parent 3fb58625d1
commit fc069a3a63
19 changed files with 2983 additions and 148 deletions

View File

@ -5544,6 +5544,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
<varlistentry id="guc-enable_self_join_elimination" xreflabel="enable_self_join_elimination">
<term><varname>enable_self_join_elimination</varname> (<type>boolean</type>)
<indexterm>
<primary><varname>enable_self_join_elimination</varname> configuration parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Enables or disables the query planner's optimization which analyses
the query tree and replaces self joins with semantically equivalent
single scans. Takes into consideration only plain tables.
The default is <literal>on</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
<indexterm>

View File

@ -852,7 +852,8 @@ find_computable_ec_member(PlannerInfo *root,
exprvars = pull_var_clause((Node *) exprs,
PVC_INCLUDE_AGGREGATES |
PVC_INCLUDE_WINDOWFUNCS |
PVC_INCLUDE_PLACEHOLDERS);
PVC_INCLUDE_PLACEHOLDERS |
PVC_INCLUDE_CONVERTROWTYPES);
foreach(lc, ec->ec_members)
{

View File

@ -4162,6 +4162,22 @@ bool
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist)
{
return relation_has_unique_index_ext(root, rel, restrictlist,
exprlist, oprlist, NULL);
}
/*
* relation_has_unique_index_ext
* Same as relation_has_unique_index_for(), but supports extra_clauses
* parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
* which were used to derive uniqueness.
*/
bool
relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist,
List **extra_clauses)
{
ListCell *ic;
@ -4217,6 +4233,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
List *exprs = NIL;
/*
* If the index is not unique, or not immediately enforced, or if it's
@ -4268,6 +4285,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
{
MemoryContext oldMemCtx =
MemoryContextSwitchTo(root->planner_cxt);
/*
* Add filter clause into a list allowing caller to
* know if uniqueness have made not only by join
* clauses.
*/
Assert(bms_is_empty(rinfo->left_relids) ||
bms_is_empty(rinfo->right_relids));
if (extra_clauses)
exprs = lappend(exprs, rinfo);
MemoryContextSwitchTo(oldMemCtx);
}
break;
}
}
@ -4310,7 +4345,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
/* Matched all key columns of this index? */
if (c == ind->nkeycolumns)
{
if (extra_clauses)
*extra_clauses = exprs;
return true;
}
}
return false;

File diff suppressed because it is too large Load Diff

View File

@ -233,6 +233,11 @@ query_planner(PlannerInfo *root,
*/
reduce_unique_semijoins(root);
/*
* Remove self joins on a unique column.
*/
joinlist = remove_useless_self_joins(root, joinlist);
/*
* Now distribute "placeholders" to base rels as needed. This has to be
* done after join removal because removal could change whether a

View File

@ -639,14 +639,17 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
* otherwise do statistical estimation.
*
* XXX you don't really want to know about this: we do the estimation
* using the subquery's original targetlist expressions, not the
* using the subroot->parse's original targetlist expressions, not the
* subroot->processed_tlist which might seem more appropriate. The reason
* is that if the subquery is itself a setop, it may return a
* processed_tlist containing "varno 0" Vars generated by
* generate_append_tlist, and those would confuse estimate_num_groups
* mightily. We ought to get rid of the "varno 0" hack, but that requires
* a redesign of the parsetree representation of setops, so that there can
* be an RTE corresponding to each setop's output.
* be an RTE corresponding to each setop's output. Note, we use this not
* subquery's targetlist but subroot->parse's targetlist, because it was
* revised by self-join removal. subquery's targetlist might contain the
* references to the removed relids.
*/
if (pNumGroups)
{
@ -659,7 +662,7 @@ build_setop_child_paths(PlannerInfo *root, RelOptInfo *rel,
*pNumGroups = rel->cheapest_total_path->rows;
else
*pNumGroups = estimate_num_groups(subroot,
get_tlist_exprs(subquery->targetList, false),
get_tlist_exprs(subroot->parse->targetList, false),
rel->cheapest_total_path->rows,
NULL,
NULL);

View File

@ -64,7 +64,6 @@ static bool locate_windowfunc_walker(Node *node,
locate_windowfunc_context *context);
static bool checkExprHasSubLink_walker(Node *node, void *context);
static Relids offset_relid_set(Relids relids, int offset);
static Relids adjust_relid_set(Relids relids, int oldrelid, int newrelid);
static Node *add_nulling_relids_mutator(Node *node,
add_nulling_relids_context *context);
static Node *remove_nulling_relids_mutator(Node *node,
@ -543,6 +542,8 @@ offset_relid_set(Relids relids, int offset)
* (identified by sublevels_up and rt_index), and change their varno fields
* to 'new_index'. The varnosyn fields are changed too. Also, adjust other
* nodes that contain rangetable indexes, such as RangeTblRef and JoinExpr.
* Specifying 'change_RangeTblRef' to false allows skipping RangeTblRef.
* See ChangeVarNodesExtended for details.
*
* NOTE: although this has the form of a walker, we cheat and modify the
* nodes in-place. The given expression tree should have been copied
@ -554,6 +555,7 @@ typedef struct
int rt_index;
int new_index;
int sublevels_up;
bool change_RangeTblRef;
} ChangeVarNodes_context;
static bool
@ -586,7 +588,7 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
cexpr->cvarno = context->new_index;
return false;
}
if (IsA(node, RangeTblRef))
if (IsA(node, RangeTblRef) && context->change_RangeTblRef)
{
RangeTblRef *rtr = (RangeTblRef *) node;
@ -633,6 +635,75 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
}
return false;
}
if (IsA(node, RestrictInfo))
{
RestrictInfo *rinfo = (RestrictInfo *) node;
int relid = -1;
bool is_req_equal =
(rinfo->required_relids == rinfo->clause_relids);
bool clause_relids_is_multiple =
(bms_membership(rinfo->clause_relids) == BMS_MULTIPLE);
if (bms_is_member(context->rt_index, rinfo->clause_relids))
{
expression_tree_walker((Node *) rinfo->clause, ChangeVarNodes_walker, (void *) context);
expression_tree_walker((Node *) rinfo->orclause, ChangeVarNodes_walker, (void *) context);
rinfo->clause_relids =
adjust_relid_set(rinfo->clause_relids, context->rt_index, context->new_index);
rinfo->num_base_rels = bms_num_members(rinfo->clause_relids);
rinfo->left_relids =
adjust_relid_set(rinfo->left_relids, context->rt_index, context->new_index);
rinfo->right_relids =
adjust_relid_set(rinfo->right_relids, context->rt_index, context->new_index);
}
if (is_req_equal)
rinfo->required_relids = rinfo->clause_relids;
else
rinfo->required_relids =
adjust_relid_set(rinfo->required_relids, context->rt_index, context->new_index);
rinfo->outer_relids =
adjust_relid_set(rinfo->outer_relids, context->rt_index, context->new_index);
rinfo->incompatible_relids =
adjust_relid_set(rinfo->incompatible_relids, context->rt_index, context->new_index);
if (rinfo->mergeopfamilies &&
bms_get_singleton_member(rinfo->clause_relids, &relid) &&
clause_relids_is_multiple &&
relid == context->new_index && IsA(rinfo->clause, OpExpr))
{
Expr *leftOp;
Expr *rightOp;
leftOp = (Expr *) get_leftop(rinfo->clause);
rightOp = (Expr *) get_rightop(rinfo->clause);
/*
* For self-join elimination, changing varnos could transform
* "t1.a = t2.a" into "t1.a = t1.a". That is always true as long
* as "t1.a" is not null. We use qual() to check for such a case,
* and then we replace the qual for a check for not null
* (NullTest).
*/
if (leftOp != NULL && equal(leftOp, rightOp))
{
NullTest *ntest = makeNode(NullTest);
ntest->arg = leftOp;
ntest->nulltesttype = IS_NOT_NULL;
ntest->argisrow = false;
ntest->location = -1;
rinfo->clause = (Expr *) ntest;
rinfo->mergeopfamilies = NIL;
rinfo->left_em = NULL;
rinfo->right_em = NULL;
}
Assert(rinfo->orclause == NULL);
}
return false;
}
if (IsA(node, AppendRelInfo))
{
AppendRelInfo *appinfo = (AppendRelInfo *) node;
@ -665,32 +736,32 @@ ChangeVarNodes_walker(Node *node, ChangeVarNodes_context *context)
return expression_tree_walker(node, ChangeVarNodes_walker, context);
}
/*
* ChangeVarNodesExtended - similar to ChangeVarNodes, but has additional
* 'change_RangeTblRef' param
*
* ChangeVarNodes changes a given node and all of its underlying nodes.
* However, self-join elimination (SJE) needs to skip the RangeTblRef node
* type. During SJE's last step, remove_rel_from_joinlist() removes
* remaining RangeTblRefs with target relid. If ChangeVarNodes() replaces
* the target relid before, remove_rel_from_joinlist() fails to identify
* the nodes to delete.
*/
void
ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
int sublevels_up, bool change_RangeTblRef)
{
ChangeVarNodes_context context;
context.rt_index = rt_index;
context.new_index = new_index;
context.sublevels_up = sublevels_up;
context.change_RangeTblRef = change_RangeTblRef;
/*
* Must be prepared to start with a Query or a bare expression tree; if
* it's a Query, go straight to query_tree_walker to make sure that
* sublevels_up doesn't get incremented prematurely.
*/
if (node && IsA(node, Query))
{
Query *qry = (Query *) node;
/*
* If we are starting at a Query, and sublevels_up is zero, then we
* must also fix rangetable indexes in the Query itself --- namely
* resultRelation, mergeTargetRelation, exclRelIndex and rowMarks
* entries. sublevels_up cannot be zero when recursing into a
* subquery, so there's no need to have the same logic inside
* ChangeVarNodes_walker.
*/
if (sublevels_up == 0)
{
ListCell *l;
@ -701,7 +772,6 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
if (qry->mergeTargetRelation == rt_index)
qry->mergeTargetRelation = new_index;
/* this is unlikely to ever be used, but ... */
if (qry->onConflict && qry->onConflict->exclRelIndex == rt_index)
qry->onConflict->exclRelIndex = new_index;
@ -719,15 +789,22 @@ ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
ChangeVarNodes_walker(node, &context);
}
void
ChangeVarNodes(Node *node, int rt_index, int new_index, int sublevels_up)
{
ChangeVarNodesExtended(node, rt_index, new_index, sublevels_up, true);
}
/*
* Substitute newrelid for oldrelid in a Relid set
* adjust_relid_set - substitute newrelid for oldrelid in a Relid set
*
* Note: some extensions may pass a special varno such as INDEX_VAR for
* oldrelid. bms_is_member won't like that, but we should tolerate it.
* (Perhaps newrelid could also be a special varno, but there had better
* not be a reason to inject that into a nullingrels or phrels set.)
* Attempt to remove oldrelid from a Relid set (as long as it's not a special
* varno). If oldrelid was found and removed, insert newrelid into a Relid
* set (as long as it's not a special varno). Therefore, when oldrelid is
* a special varno, this function does nothing. When newrelid is a special
* varno, this function behaves as delete.
*/
static Relids
Relids
adjust_relid_set(Relids relids, int oldrelid, int newrelid)
{
if (!IS_SPECIAL_VARNO(oldrelid) && bms_is_member(oldrelid, relids))
@ -736,7 +813,8 @@ adjust_relid_set(Relids relids, int oldrelid, int newrelid)
relids = bms_copy(relids);
/* Remove old, add new */
relids = bms_del_member(relids, oldrelid);
relids = bms_add_member(relids, newrelid);
if (!IS_SPECIAL_VARNO(newrelid))
relids = bms_add_member(relids, newrelid);
}
return relids;
}

View File

@ -988,6 +988,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
{
{"enable_self_join_elimination", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enable removal of unique self-joins."),
NULL,
GUC_EXPLAIN | GUC_NOT_IN_SAMPLE
},
&enable_self_join_elimination,
true,
NULL, NULL, NULL
},
{
{"enable_group_by_reordering", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables reordering of GROUP BY keys."),

View File

@ -201,6 +201,11 @@ typedef struct PlannerGlobal
* Not all fields are printed. (In some cases, there is no print support for
* the field type; in others, doing so would lead to infinite recursion or
* bloat dump output more than seems useful.)
*
* NOTE: When adding new entries containing relids and relid bitmapsets,
* remember to check that they will be correctly processed by
* the remove_self_join_rel function - relid of removing relation will be
* correctly replaced with the keeping one.
*----------
*/
#ifndef HAVE_PLANNERINFO_TYPEDEF
@ -753,7 +758,7 @@ typedef struct PartitionSchemeData *PartitionScheme;
* populate these fields, for base rels; but someday they might be used for
* join rels too:
*
* unique_for_rels - list of Relid sets, each one being a set of other
* unique_for_rels - list of UniqueRelInfo, each one being a set of other
* rels for which this one has been proven unique
* non_unique_for_rels - list of Relid sets, each one being a set of
* other rels for which we have tried and failed to prove
@ -992,7 +997,7 @@ typedef struct RelOptInfo
/*
* cache space for remembering if we have proven this relation unique
*/
/* known unique for these other relid set(s) */
/* known unique for these other relid set(s) given in UniqueRelInfo(s) */
List *unique_for_rels;
/* known not unique for these set(s) */
List *non_unique_for_rels;
@ -3463,4 +3468,35 @@ typedef struct AggTransInfo
bool initValueIsNull;
} AggTransInfo;
/*
* UniqueRelInfo caches a fact that a relation is unique when being joined
* to other relation(s).
*/
typedef struct UniqueRelInfo
{
pg_node_attr(no_copy_equal, no_read, no_query_jumble)
NodeTag type;
/*
* The relation in consideration is unique when being joined with this set
* of other relation(s).
*/
Relids outerrelids;
/*
* The relation in consideration is unique when considering only clauses
* suitable for self-join (passed split_selfjoin_quals()).
*/
bool self_join;
/*
* Additional clauses from a baserestrictinfo list that were used to prove
* the uniqueness. We cache it for the self-join checking procedure: a
* self-join can be removed if the outer relation contains strictly the
* same set of clauses.
*/
List *extra_clauses;
} UniqueRelInfo;
#endif /* PATHNODES_H */

View File

@ -192,6 +192,8 @@ extern SortGroupClause *get_sortgroupref_clause_noerr(Index sortref,
* output list */
#define PVC_RECURSE_PLACEHOLDERS 0x0020 /* recurse into PlaceHolderVar
* arguments */
#define PVC_INCLUDE_CONVERTROWTYPES 0x0040 /* include ConvertRowtypeExprs in
* output list */
extern Bitmapset *pull_varnos(PlannerInfo *root, Node *node);
extern Bitmapset *pull_varnos_of_level(PlannerInfo *root, Node *node, int levelsup);

View File

@ -72,6 +72,9 @@ extern void create_index_paths(PlannerInfo *root, RelOptInfo *rel);
extern bool relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist,
List *exprlist, List *oprlist);
extern bool relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
List *restrictlist, List *exprlist,
List *oprlist, List **extra_clauses);
extern bool indexcol_is_bool_constant_for_query(PlannerInfo *root,
IndexOptInfo *index,
int indexcol);

View File

@ -20,6 +20,7 @@
/* GUC parameters */
#define DEFAULT_CURSOR_TUPLE_FRACTION 0.1
extern PGDLLIMPORT double cursor_tuple_fraction;
extern PGDLLIMPORT bool enable_self_join_elimination;
/* query_planner callback to compute query_pathkeys */
typedef void (*query_pathkeys_callback) (PlannerInfo *root, void *extra);
@ -113,6 +114,11 @@ extern bool query_is_distinct_for(Query *query, List *colnos, List *opids);
extern bool innerrel_is_unique(PlannerInfo *root,
Relids joinrelids, Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist, bool force_cache);
extern bool innerrel_is_unique_ext(PlannerInfo *root, Relids joinrelids,
Relids outerrelids, RelOptInfo *innerrel,
JoinType jointype, List *restrictlist,
bool force_cache, List **uclauses);
extern List *remove_useless_self_joins(PlannerInfo *root, List *jointree);
/*
* prototypes for plan/setrefs.c

View File

@ -15,6 +15,7 @@
#define REWRITEMANIP_H
#include "nodes/parsenodes.h"
#include "nodes/pathnodes.h"
struct AttrMap; /* avoid including attmap.h here */
@ -41,11 +42,14 @@ typedef enum ReplaceVarsNoMatchOption
} ReplaceVarsNoMatchOption;
extern Relids adjust_relid_set(Relids relids, int oldrelid, int newrelid);
extern void CombineRangeTables(List **dst_rtable, List **dst_perminfos,
List *src_rtable, List *src_perminfos);
extern void OffsetVarNodes(Node *node, int offset, int sublevels_up);
extern void ChangeVarNodes(Node *node, int rt_index, int new_index,
int sublevels_up);
extern void ChangeVarNodesExtended(Node *node, int rt_index, int new_index,
int sublevels_up, bool change_RangeTblRef);
extern void IncrementVarSublevelsUp(Node *node, int delta_sublevels_up,
int min_sublevels_up);
extern void IncrementVarSublevelsUp_rtable(List *rtable,

View File

@ -434,6 +434,36 @@ explain (costs off)
Filter: ((unique1 IS NOT NULL) AND (unique2 IS NOT NULL))
(2 rows)
-- Test that broken ECs are processed correctly during self join removal.
-- Disable merge joins so that we don't get an error about missing commutator.
-- Test both orientations of the join clause, because only one of them breaks
-- the EC.
set enable_mergejoin to off;
explain (costs off)
select * from ec0 m join ec0 n on m.ff = n.ff
join ec1 p on m.ff + n.ff = p.f1;
QUERY PLAN
---------------------------------------
Nested Loop
Join Filter: ((n.ff + n.ff) = p.f1)
-> Seq Scan on ec0 n
-> Materialize
-> Seq Scan on ec1 p
(5 rows)
explain (costs off)
select * from ec0 m join ec0 n on m.ff = n.ff
join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
QUERY PLAN
---------------------------------------------------------------
Nested Loop
Join Filter: ((p.f1)::bigint = ((n.ff + n.ff))::int8alias1)
-> Seq Scan on ec0 n
-> Materialize
-> Seq Scan on ec1 p
(5 rows)
reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;

File diff suppressed because it is too large Load Diff

View File

@ -168,10 +168,11 @@ select name, setting from pg_settings where name like 'enable%';
enable_partitionwise_aggregate | off
enable_partitionwise_join | off
enable_presorted_aggregate | on
enable_self_join_elimination | on
enable_seqscan | on
enable_sort | on
enable_tidscan | on
(23 rows)
(24 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.

View File

@ -259,6 +259,22 @@ drop user regress_user_ectest;
explain (costs off)
select * from tenk1 where unique1 = unique1 and unique2 = unique2;
-- Test that broken ECs are processed correctly during self join removal.
-- Disable merge joins so that we don't get an error about missing commutator.
-- Test both orientations of the join clause, because only one of them breaks
-- the EC.
set enable_mergejoin to off;
explain (costs off)
select * from ec0 m join ec0 n on m.ff = n.ff
join ec1 p on m.ff + n.ff = p.f1;
explain (costs off)
select * from ec0 m join ec0 n on m.ff = n.ff
join ec1 p on p.f1::int8 = (m.ff + n.ff)::int8alias1;
reset enable_mergejoin;
-- this could be converted, but isn't at present
explain (costs off)
select * from tenk1 where unique1 = unique1 or unique2 = unique2;

View File

@ -2175,6 +2175,17 @@ select c.id, ss.a from c
left join (select d.a from onerow, d left join b on d.a = b.id) ss
on c.id = ss.a;
-- check the case when the placeholder relates to an outer join and its
-- inner in the press field but actually uses only the outer side of the join
explain (costs off)
SELECT q.val FROM b LEFT JOIN (
SELECT (q1.z IS NOT NULL) AS val
FROM b LEFT JOIN (
SELECT (t1.b_id IS NOT NULL) AS z FROM a t1 LEFT JOIN a t2 USING (id)
) AS q1
ON true
) AS q ON true;
CREATE TEMP TABLE parted_b (id int PRIMARY KEY) partition by range(id);
CREATE TEMP TABLE parted_b1 partition of parted_b for values from (0) to (10);
@ -2409,6 +2420,489 @@ select * from
select * from
int8_tbl x join (int4_tbl x cross join int4_tbl y(ff)) j on q1 = f1; -- ok
--
-- test that semi- or inner self-joins on a unique column are removed
--
-- enable only nestloop to get more predictable plans
set enable_hashjoin to off;
set enable_mergejoin to off;
create table sj (a int unique, b int, c int unique);
insert into sj values (1, null, 2), (null, 2, null), (2, 1, 1);
analyze sj;
-- Trivial self-join case.
explain (costs off)
select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
select p.* from sj p, sj q where q.a = p.a and q.b = q.a - 1;
-- Self-join removal performs after a subquery pull-up process and could remove
-- such kind of self-join too. Check this option.
explain (costs off)
select * from sj p
where exists (select * from sj q
where q.a = p.a and q.b < 10);
select * from sj p
where exists (select * from sj q
where q.a = p.a and q.b < 10);
-- Don't remove self-join for the case of equality of two different unique columns.
explain (costs off)
select * from sj t1, sj t2 where t1.a = t2.c and t1.b is not null;
-- Ensure that relations with TABLESAMPLE clauses are not considered as
-- candidates to be removed
explain (costs off)
select * from sj t1
join lateral
(select * from sj tablesample system(t1.b)) s
on t1.a = s.a;
-- Ensure that SJE does not form a self-referential lateral dependency
explain (costs off)
select * from sj t1
left join lateral
(select t1.a as t1a, * from sj t2) s
on true
where t1.a = s.a;
-- Degenerated case.
explain (costs off)
select * from
(select a as x from sj where false) as q1,
(select a as y from sj where false) as q2
where q1.x = q2.y;
-- We can't use a cross-EC generated self join qual because of current logic of
-- the generate_join_implied_equalities routine.
explain (costs off)
select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a;
explain (costs off)
select * from sj t1, sj t2, sj t3
where t1.a = t1.b and t1.b = t2.b and t2.b = t2.a and
t1.b = t3.b and t3.b = t3.a;
-- Double self-join removal.
-- Use a condition on "b + 1", not on "b", for the second join, so that
-- the equivalence class is different from the first one, and we can
-- test the non-ec code path.
explain (costs off)
select *
from sj t1
join sj t2 on t1.a = t2.a and t1.b = t2.b
join sj t3 on t2.a = t3.a and t2.b + 1 = t3.b + 1;
-- subselect that references the removed relation
explain (costs off)
select t1.a, (select a from sj where a = t2.a and a = t1.a)
from sj t1, sj t2
where t1.a = t2.a;
-- self-join under outer join
explain (costs off)
select * from sj x join sj y on x.a = y.a
left join int8_tbl z on x.a = z.q1;
explain (costs off)
select * from sj x join sj y on x.a = y.a
left join int8_tbl z on y.a = z.q1;
explain (costs off)
select * from (
select t1.*, t2.a as ax from sj t1 join sj t2
on (t1.a = t2.a and t1.c * t1.c = t2.c + 2 and t2.b is null)
) as q1
left join
(select t3.* from sj t3, sj t4 where t3.c = t4.c) as q2
on q1.ax = q2.a;
-- Test that placeholders are updated correctly after join removal
explain (costs off)
select * from (values (1)) x
left join (select coalesce(y.q1, 1) from int8_tbl y
right join sj j1 inner join sj j2 on j1.a = j2.a
on true) z
on true;
-- Test that references to the removed rel in lateral subqueries are replaced
-- correctly after join removal
explain (verbose, costs off)
select t3.a from sj t1
join sj t2 on t1.a = t2.a
join lateral (select t1.a offset 0) t3 on true;
explain (verbose, costs off)
select t3.a from sj t1
join sj t2 on t1.a = t2.a
join lateral (select * from (select t1.a offset 0) offset 0) t3 on true;
explain (verbose, costs off)
select t4.a from sj t1
join sj t2 on t1.a = t2.a
join lateral (select t3.a from sj t3, (select t1.a) offset 0) t4 on true;
-- Check updating of semi_rhs_exprs links from upper-level semi join to
-- the removing relation
explain (verbose, costs off)
select t1.a from sj t1 where t1.b in (
select t2.b from sj t2 join sj t3 on t2.c=t3.c);
--
-- SJE corner case: uniqueness of an inner is [partially] derived from
-- baserestrictinfo clauses.
-- XXX: We really should allow SJE for these corner cases?
--
INSERT INTO sj VALUES (3, 1, 3);
-- Don't remove SJ
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
-- Return one row
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 3;
-- Remove SJ, define uniqueness by a constant
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
-- Return one row
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2 AND j2.a = 2;
-- Remove SJ, define uniqueness by a constant expression
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
-- Return one row
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND j1.a = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = j2.a;
-- Remove SJ
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
-- Return no rows
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 1 AND j2.a = 1;
-- Shuffle a clause. Remove SJ
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
-- Return no rows
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 1 = j1.a AND j2.a = 1;
-- SJE Corner case: a 'a.x=a.x' clause, have replaced with 'a.x IS NOT NULL'
-- after SJ elimination it shouldn't be a mergejoinable clause.
EXPLAIN (COSTS OFF)
SELECT t4.*
FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
SELECT t4.*
FROM (SELECT t1.*, t2.a AS a1 FROM sj t1, sj t2 WHERE t1.b = t2.b) AS t3
JOIN sj t4 ON (t4.a = t3.a) WHERE t3.a1 = 42;
-- Functional index
CREATE UNIQUE INDEX sj_fn_idx ON sj((a * a));
-- Remove SJ
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 1;
-- Don't remove SJ
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND j1.a*j1.a = 1 AND j2.a*j2.a = 2;
-- Restriction contains expressions in both sides, Remove SJ.
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
-- Empty set of rows should be returned
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND (j1.a*j1.a) = (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int
AND (EXTRACT(DOW FROM current_timestamp(0))/15 + 3)::int = (j2.a*j2.a);
-- Restriction contains volatile function - disable SJE feature.
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND (j1.a*j1.c/3) = (random()/3 + 3)::int
AND (random()/3 + 3)::int = (j2.a*j2.c/3);
-- Return one row
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b
AND (j1.a*j1.c/3) = (random()/3 + 3)::int
AND (random()/3 + 3)::int = (j2.a*j2.c/3);
-- Multiple filters
CREATE UNIQUE INDEX sj_temp_idx1 ON sj(a,b,c);
-- Remove SJ
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND j1.a = 2 AND j1.c = 3 AND j2.a = 2 AND 3 = j2.c;
-- Don't remove SJ
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2
WHERE j1.b = j2.b AND 2 = j1.a AND j1.c = 3 AND j2.a = 1 AND 3 = j2.c;
CREATE UNIQUE INDEX sj_temp_idx ON sj(a,b);
-- Don't remove SJ
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND j1.a = 2;
-- Don't remove SJ
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND 2 = j2.a;
-- Don't remove SJ
EXPLAIN (COSTS OFF)
SELECT * FROM sj j1, sj j2 WHERE j1.b = j2.b AND (j1.a = 1 OR j2.a = 1);
DROP INDEX sj_fn_idx, sj_temp_idx1, sj_temp_idx;
-- Test that OR predicated are updated correctly after join removal
CREATE TABLE tab_with_flag ( id INT PRIMARY KEY, is_flag SMALLINT);
CREATE INDEX idx_test_is_flag ON tab_with_flag (is_flag);
EXPLAIN (COSTS OFF)
SELECT COUNT(*) FROM tab_with_flag
WHERE
(is_flag IS NULL OR is_flag = 0)
AND id IN (SELECT id FROM tab_with_flag WHERE id IN (2, 3));
DROP TABLE tab_with_flag;
-- HAVING clause
explain (costs off)
select p.b from sj p join sj q on p.a = q.a group by p.b having sum(p.a) = 1;
-- update lateral references and range table entry reference
explain (verbose, costs off)
select 1 from (select x.* from sj x, sj y where x.a = y.a) q,
lateral generate_series(1, q.a) gs(i);
explain (verbose, costs off)
select 1 from (select y.* from sj x, sj y where x.a = y.a) q,
lateral generate_series(1, q.a) gs(i);
-- Test that a non-EC-derived join clause is processed correctly. Use an
-- outer join so that we can't form an EC.
explain (costs off) select * from sj p join sj q on p.a = q.a
left join sj r on p.a + q.a = r.a;
-- FIXME this constant false filter doesn't look good. Should we merge
-- equivalence classes?
explain (costs off)
select * from sj p, sj q where p.a = q.a and p.b = 1 and q.b = 2;
-- Check that attr_needed is updated correctly after self-join removal. In this
-- test, the join of j1 with j2 is removed. k1.b is required at either j1 or j2.
-- If this info is lost, join targetlist for (k1, k2) will not contain k1.b.
-- Use index scan for k1 so that we don't get 'b' from physical tlist used for
-- seqscan. Also disable reordering of joins because this test depends on a
-- particular join tree.
create table sk (a int, b int);
create index on sk(a);
set join_collapse_limit to 1;
set enable_seqscan to off;
explain (costs off) select 1 from
(sk k1 join sk k2 on k1.a = k2.a)
join (sj j1 join sj j2 on j1.a = j2.a) on j1.b = k1.b;
explain (costs off) select 1 from
(sk k1 join sk k2 on k1.a = k2.a)
join (sj j1 join sj j2 on j1.a = j2.a) on j2.b = k1.b;
reset join_collapse_limit;
reset enable_seqscan;
-- Check that clauses from the join filter list is not lost on the self-join removal
CREATE TABLE emp1 (id SERIAL PRIMARY KEY NOT NULL, code int);
EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM emp1 e1, emp1 e2 WHERE e1.id = e2.id AND e2.code <> e1.code;
-- Shuffle self-joined relations. Only in the case of iterative deletion
-- attempts explains of these queries will be identical.
CREATE UNIQUE INDEX ON emp1((id*id));
EXPLAIN (COSTS OFF)
SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
WHERE c1.id=c2.id AND c1.id*c2.id=c3.id*c3.id;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
WHERE c1.id=c3.id AND c1.id*c3.id=c2.id*c2.id;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM emp1 c1, emp1 c2, emp1 c3
WHERE c3.id=c2.id AND c3.id*c2.id=c1.id*c1.id;
-- Check the usage of a parse tree by the set operations (bug #18170)
EXPLAIN (COSTS OFF)
SELECT c1.code FROM emp1 c1 LEFT JOIN emp1 c2 ON c1.id = c2.id
WHERE c2.id IS NOT NULL
EXCEPT ALL
SELECT c3.code FROM emp1 c3;
-- Check that SJE removes references from PHVs correctly
explain (costs off)
select * from emp1 t1 left join
(select coalesce(t3.code, 1) from emp1 t2
left join (emp1 t3 join emp1 t4 on t3.id = t4.id)
on true)
on true;
-- Check that SJE removes the whole PHVs correctly
explain (verbose, costs off)
select 1 from emp1 t1 left join
((select 1 as x, * from emp1 t2) s1 inner join
(select * from emp1 t3) s2 on s1.id = s2.id)
on true
where s1.x = 1;
-- Check that PHVs do not impose any constraints on removing self joins
explain (verbose, costs off)
select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
lateral (select t1.id as t1id, * from generate_series(1,1) t3) s on true;
explain (verbose, costs off)
select * from generate_series(1,10) t1(id) left join
lateral (select t1.id as t1id, t2.id from emp1 t2 join emp1 t3 on t2.id = t3.id)
on true;
-- Check that SJE replaces join clauses involving the removed rel correctly
explain (costs off)
select * from emp1 t1
inner join emp1 t2 on t1.id = t2.id
left join emp1 t3 on t1.id > 1 and t1.id < 2;
-- Check that SJE doesn't replace the target relation
EXPLAIN (COSTS OFF)
WITH t1 AS (SELECT * FROM emp1)
UPDATE emp1 SET code = t1.code + 1 FROM t1
WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
INSERT INTO emp1 VALUES (1, 1), (2, 1);
WITH t1 AS (SELECT * FROM emp1)
UPDATE emp1 SET code = t1.code + 1 FROM t1
WHERE t1.id = emp1.id RETURNING emp1.id, emp1.code, t1.code;
TRUNCATE emp1;
EXPLAIN (COSTS OFF)
UPDATE sj sq SET b = 1 FROM sj as sz WHERE sq.a = sz.a;
CREATE RULE sj_del_rule AS ON DELETE TO sj
DO INSTEAD
UPDATE sj SET a = 1 WHERE a = old.a;
EXPLAIN (COSTS OFF) DELETE FROM sj;
DROP RULE sj_del_rule ON sj CASCADE;
-- Check that SJE does not mistakenly omit qual clauses (bug #18187)
insert into emp1 values (1, 1);
explain (costs off)
select 1 from emp1 full join
(select * from emp1 t1 join
emp1 t2 join emp1 t3 on t2.id = t3.id
on true
where false) s on true
where false;
select 1 from emp1 full join
(select * from emp1 t1 join
emp1 t2 join emp1 t3 on t2.id = t3.id
on true
where false) s on true
where false;
-- Check that SJE does not mistakenly re-use knowledge of relation uniqueness
-- made with different set of quals
insert into emp1 values (2, 1);
explain (costs off)
select * from emp1 t1 where exists (select * from emp1 t2
where t2.id = t1.code and t2.code > 0);
select * from emp1 t1 where exists (select * from emp1 t2
where t2.id = t1.code and t2.code > 0);
-- We can remove the join even if we find the join can't duplicate rows and
-- the base quals of each side are different. In the following case we end up
-- moving quals over to s1 to make it so it can't match any rows.
create table sl(a int, b int, c int);
create unique index on sl(a, b);
vacuum analyze sl;
-- Both sides are unique, but base quals are different
explain (costs off)
select * from sl t1, sl t2 where t1.a = t2.a and t1.b = 1 and t2.b = 2;
-- Check NullTest in baserestrictinfo list
explain (costs off)
select * from sl t1, sl t2
where t1.a = t2.a and t1.b = 1 and t2.b = 2
and t1.c IS NOT NULL and t2.c IS NOT NULL
and t2.b IS NOT NULL and t1.b IS NOT NULL
and t1.a IS NOT NULL and t2.a IS NOT NULL;
explain (verbose, costs off)
select * from sl t1, sl t2
where t1.b = t2.b and t2.a = 3 and t1.a = 3
and t1.c IS NOT NULL and t2.c IS NOT NULL
and t2.b IS NOT NULL and t1.b IS NOT NULL
and t1.a IS NOT NULL and t2.a IS NOT NULL;
-- Join qual isn't mergejoinable, but inner is unique.
EXPLAIN (COSTS OFF)
SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a AND n2.a = 1;
EXPLAIN (COSTS OFF)
SELECT * FROM
(SELECT n2.a FROM sj n1, sj n2 WHERE n1.a <> n2.a) q0, sl
WHERE q0.a = 1;
-- Check optimization disabling if it will violate special join conditions.
-- Two identical joined relations satisfies self join removal conditions but
-- stay in different special join infos.
CREATE TABLE sj_t1 (id serial, a int);
CREATE TABLE sj_t2 (id serial, a int);
CREATE TABLE sj_t3 (id serial, a int);
CREATE TABLE sj_t4 (id serial, a int);
CREATE UNIQUE INDEX ON sj_t3 USING btree (a,id);
CREATE UNIQUE INDEX ON sj_t2 USING btree (id);
EXPLAIN (COSTS OFF)
SELECT * FROM sj_t1
JOIN (
SELECT sj_t2.id AS id FROM sj_t2
WHERE EXISTS
(
SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
)
) t2t3t4
ON sj_t1.id = t2t3t4.id
JOIN (
SELECT sj_t2.id AS id FROM sj_t2
WHERE EXISTS
(
SELECT TRUE FROM sj_t3,sj_t4 WHERE sj_t3.a = 1 AND sj_t3.id = sj_t2.id
)
) _t2t3t4
ON sj_t1.id = _t2t3t4.id;
--
-- Test RowMarks-related code
--
-- Both sides have explicit LockRows marks
EXPLAIN (COSTS OFF)
SELECT a1.a FROM sj a1,sj a2 WHERE (a1.a=a2.a) FOR UPDATE;
reset enable_hashjoin;
reset enable_mergejoin;
--
-- Test hints given on incorrect column references are useful
--

View File

@ -2593,6 +2593,7 @@ SeenRelsEntry
SelectLimit
SelectStmt
Selectivity
SelfJoinCandidate
SemTPadded
SemiAntiJoinFactors
SeqScan
@ -4056,6 +4057,7 @@ unicodeStyleColumnFormat
unicodeStyleFormat
unicodeStyleRowFormat
unicode_linestyle
UniqueRelInfo
unit_conversion
unlogged_relation_entry
utf_local_conversion_func