BRIN: Block Range Indexes

BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with the query quals; all pages in the range are returned in a lossy TID bitmap if the quals are consistent with the values in the summary tuple, otherwise not. Normal index scans are not supported because these indexes do not store TIDs. As new tuples are added into the index, the summary information is updated (if the block range in which the tuple is added is already summarized) or not; in the latter case, a subsequent pass of VACUUM or the brin_summarize_new_values() function will create the summary information. For data types with natural 1-D sort orders, the summary info consists of the maximum and the minimum values of each indexed column within each page range. This type of operator class we call "Minmax", and we supply a bunch of them for most data types with B-tree opclasses. Since the BRIN code is generalized, other approaches are possible for things such as arrays, geometric types, ranges, etc; even for things such as enum types we could do something different than minmax with better results. In this commit I only include minmax. Catalog version bumped due to new builtin catalog entries. There's more that could be done here, but this is a good step forwards. Loosely based on ideas from Simon Riggs; code mostly by Álvaro Herrera, with contribution by Heikki Linnakangas. Patch reviewed by: Amit Kapila, Heikki Linnakangas, Robert Haas. Testing help from Jeff Janes, Erik Rijkers, Emanuel Calvo. PS: The research leading to these results has received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633.
2025-12-07 12:02:30 +03:00 · 2014-11-07 16:38:14 -03:00
parent 1961b1c131
commit 7516f52594
57 changed files with 6807 additions and 24 deletions
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -6081,7 +6081,7 @@ genericcostestimate(PlannerInfo *root,
 	else
 		numIndexPages = 1.0;

-	/* fetch estimated page cost for schema containing index */
+	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
 							  &spc_random_page_cost,
 							  NULL);
@@ -7162,7 +7162,7 @@ gincostestimate(PG_FUNCTION_ARGS)
 											   JOIN_INNER,
 											   NULL);

-	/* fetch estimated page cost for schema containing index */
+	/* fetch estimated page cost for tablespace containing index */
 	get_tablespace_page_costs(index->reltablespace,
 							  &spc_random_page_cost,
 							  NULL);
@@ -7349,3 +7349,73 @@ gincostestimate(PG_FUNCTION_ARGS)

 	PG_RETURN_VOID();
 }
+
+/*
+ * BRIN has search behavior completely different from other index types
+ */
+Datum
+brincostestimate(PG_FUNCTION_ARGS)
+{
+	PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
+	IndexPath  *path = (IndexPath *) PG_GETARG_POINTER(1);
+	double		loop_count = PG_GETARG_FLOAT8(2);
+	Cost	   *indexStartupCost = (Cost *) PG_GETARG_POINTER(3);
+	Cost	   *indexTotalCost = (Cost *) PG_GETARG_POINTER(4);
+	Selectivity *indexSelectivity = (Selectivity *) PG_GETARG_POINTER(5);
+	double	   *indexCorrelation = (double *) PG_GETARG_POINTER(6);
+	IndexOptInfo *index = path->indexinfo;
+	List	   *indexQuals = path->indexquals;
+	List	   *indexOrderBys = path->indexorderbys;
+	double		numPages = index->pages;
+	double		numTuples = index->tuples;
+	Cost		spc_seq_page_cost;
+	Cost		spc_random_page_cost;
+	QualCost	index_qual_cost;
+	double		qual_op_cost;
+	double		qual_arg_cost;
+
+	/* fetch estimated page cost for tablespace containing index */
+	get_tablespace_page_costs(index->reltablespace,
+							  &spc_random_page_cost,
+							  &spc_seq_page_cost);
+
+	/*
+	 * BRIN indexes are always read in full; use that as startup cost.
+	 * XXX maybe only include revmap pages here?
+	 */
+	*indexStartupCost = spc_seq_page_cost * numPages * loop_count;
+
+	/*
+	 * To read a BRIN index there might be a bit of back and forth over regular
+	 * pages, as revmap might point to them out of sequential order; calculate
+	 * this as reading the whole index in random order.
+	 */
+	*indexTotalCost = spc_random_page_cost * numPages * loop_count;
+
+	*indexSelectivity =
+		clauselist_selectivity(root, path->indexquals,
+							   path->indexinfo->rel->relid,
+							   JOIN_INNER, NULL);
+	*indexCorrelation = 1;
+
+	/*
+	 * Add on index qual eval costs, much as in genericcostestimate.
+	 */
+	cost_qual_eval(&index_qual_cost, indexQuals, root);
+	qual_arg_cost = index_qual_cost.startup + index_qual_cost.per_tuple;
+	cost_qual_eval(&index_qual_cost, indexOrderBys, root);
+	qual_arg_cost += index_qual_cost.startup + index_qual_cost.per_tuple;
+	qual_op_cost = cpu_operator_cost *
+		(list_length(indexQuals) + list_length(indexOrderBys));
+	qual_arg_cost -= qual_op_cost;
+	if (qual_arg_cost < 0)		/* just in case... */
+		qual_arg_cost = 0;
+
+	*indexStartupCost += qual_arg_cost;
+	*indexTotalCost += qual_arg_cost;
+	*indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost);
+
+	/* XXX what about pages_per_range? */
+
+	PG_RETURN_VOID();
+}