1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-19 23:22:23 +03:00

Reduce non-leaf keys overlap in GiST indexes produced by a sorted build

The GiST sorted build currently chooses split points according to the only page
space utilization.  That may lead to higher non-leaf keys overlap and, in turn,
slower search query answers.

This commit makes the sorted build use the opclass's picksplit method.  Once
four pages at the level are accumulated, the picksplit method is applied until
each split partition fits the page.  Some of our split algorithms could show
significant performance degradation while processing 4-times more data at once.
But those opclasses haven't received the sorted build support and shouldn't
receive it before their split algorithms are improved.

Discussion: https://postgr.es/m/CAHqSB9jqtS94e9%3D0vxqQX5dxQA89N95UKyz-%3DA7Y%2B_YJt%2BVW5A%40mail.gmail.com
Author: Aliaksandr Kalenik, Sergei Shoulbakov, Andrey Borodin
Reviewed-by: Björn Harrtell, Darafei Praliaskouski, Andres Freund
Reviewed-by: Alexander Korotkov
This commit is contained in:
Alexander Korotkov
2022-02-07 23:20:42 +03:00
parent 42a9e88bf6
commit f1ea98a797
4 changed files with 199 additions and 112 deletions

View File

@@ -26,6 +26,7 @@ The current implementation of GiST supports:
* Concurrency
* Recovery support via WAL logging
* Buffering build algorithm
* Sorted build method
The support for concurrency implemented in PostgreSQL was developed based on
the paper "Access Methods for Next-Generation Database Systems" by
@@ -414,6 +415,21 @@ emptied yet; tuples never move upwards in the tree. The final emptying loops
through buffers at a given level until all buffers at that level have been
emptied, and then moves down to the next level.
Sorted build method
-------------------
Sort all input tuples, pack them into GiST leaf pages in the sorted order,
and create downlinks and internal pages as we go. This method builds the index
from the bottom up, similar to how the B-tree index is built.
The sorted method is used if the operator classes for all columns have a
"sortsupport" defined. Otherwise, we fall back on inserting tuples one by one
with optional buffering.
Sorting GiST build requires good linearization of the sort opclass. That is not
always the case in multidimensional data. To tackle the anomalies, we buffer
index tuples and apply a picksplit function that can be multidimensional-aware.
Bulk delete algorithm (VACUUM)
------------------------------