When serializing GIN tuples to tuplesorts during parallel index builds, we can significantly reduce the amount of data by compressing the TID lists. The GIN opclasses may produce a lot of data (depending on how many keys are extracted from each row), and the TID compression is very efficient and effective. If the number of distinct keys is high, the first worker pass (reading data from the table and writing them into a private tuplesort) may not benefit from the compression very much. It is likely to spill data to disk before the TID lists get long enough for the compression to help. The second pass (writing the merged data into the shared tuplesort) is more likely to benefit from compression. The compression can be seen as a way to reduce the amount of disk space needed by the parallel builds, because the data is written twice. First into the per-worker tuplesorts, then into the shared tuplesort. Author: Tomas Vondra Reviewed-by: Matthias van de Meent, Andy Fan, Kirill Reshke Discussion: https://postgr.es/m/6ab4003f-a8b8-4d75-a67f-f25ad98582dc%40enterprisedb.com
PostgreSQL Database Management System
This directory contains the source code distribution of the PostgreSQL database management system.
PostgreSQL is an advanced object-relational database management system that supports an extended subset of the SQL standard, including transactions, foreign keys, subqueries, triggers, user-defined types and functions. This distribution also contains C language bindings.
Copyright and license information can be found in the file COPYRIGHT.
General documentation about this version of PostgreSQL can be found at https://www.postgresql.org/docs/devel/. In particular, information about building PostgreSQL from the source code can be found at https://www.postgresql.org/docs/devel/installation.html.
The latest version of this software, and related software, may be obtained at https://www.postgresql.org/download/. For more information look at our web site located at https://www.postgresql.org/.