Detect redundant GROUP BY columns using UNIQUE indexes

d4c3a156c added support that when the GROUP BY contained all of the columns belonging to a relation's PRIMARY KEY, all other columns belonging to that relation would be removed from the GROUP BY clause. That's possible because all other columns are functionally dependent on the PRIMARY KEY and those columns alone ensure the groups are distinct. Here we expand on that optimization and allow it to work for any unique indexes on the table rather than just the PRIMARY KEY index. This normally requires that all columns in the index are defined with NOT NULL, however, we can relax that requirement when the index is defined with NULLS NOT DISTINCT. When there are multiple suitable indexes to allow columns to be removed, we prefer the index with the least number of columns as this allows us to remove the highest number of GROUP BY columns. One day, we may want to revisit that decision as it may make more sense to use the narrower set of columns in terms of the width of the data types and stored/queried data. This also adjusts the code to make use of RelOptInfo.indexlist rather than looking up the catalog tables. In passing, add another short-circuit path to allow bailing out earlier in cases where it's certainly not possible to remove redundant GROUP BY columns. This early exit is now cheaper to do than when this code was originally written as 00b41463c made it cheaper to check for empty Bitmapsets. Patch originally by Zhang Mingli and later worked on by jian he, but after I (David) worked on it, there was very little of the original left. Author: Zhang Mingli, jian he, David Rowley Reviewed-by: jian he, Andrei Lepikhov Discussion: https://postgr.es/m/327990c8-b9b2-4b0c-bffb-462249f82de0%40Spark
2026-01-05 23:38:41 +03:00 · 2024-12-12 15:28:38 +13:00
parent d8f335156c
commit bd10ec5297
5 changed files with 200 additions and 22 deletions
--- a/src/test/regress/sql/aggregates.sql
+++ b/src/test/regress/sql/aggregates.sql
@@ -507,6 +507,38 @@ create temp table p_t1_2 partition of p_t1 for values in(2);
 -- Ensure we can remove non-PK columns for partitioned tables.
 explain (costs off) select * from p_t1 group by a,b,c,d;

+create unique index t3_c_uidx on t3(c);
+
+-- Ensure we don't remove any columns from the GROUP BY for a unique
+-- index on a NULLable column.
+explain (costs off) select b,c from t3 group by b,c;
+
+-- Make the column NOT NULL and ensure we remove the redundant column
+alter table t3 alter column c set not null;
+explain (costs off) select b,c from t3 group by b,c;
+
+-- When there are multiple supporting unique indexes and the GROUP BY contains
+-- columns to cover all of those, ensure we pick the index with the least
+-- number of columns so that we can remove more columns from the GROUP BY.
+explain (costs off) select a,b,c from t3 group by a,b,c;
+
+-- As above but try ordering the columns differently to ensure we get the
+-- same result.
+explain (costs off) select a,b,c from t3 group by c,a,b;
+
+-- Ensure we don't use a partial index as proof of functional dependency
+drop index t3_c_uidx;
+create index t3_c_uidx on t3 (c) where c > 0;
+explain (costs off) select b,c from t3 group by b,c;
+
+-- A unique index defined as NULLS NOT DISTINCT does not need a supporting NOT
+-- NULL constraint on the indexed columns.  Ensure the redundant columns are
+-- removed from the GROUP BY for such a table.
+drop index t3_c_uidx;
+alter table t3 alter column c drop not null;
+create unique index t3_c_uidx on t3(c) nulls not distinct;
+explain (costs off) select b,c from t3 group by b,c;
+
 drop table t1 cascade;
 drop table t2;
 drop table t3;