Given that idx is a RH hashmap bucket number and info is intra-bucket idx
the root cause is triggered by the difference of idx/hash pair
calculation for a certain GROUP BY generation and for generation
aggregations merging that takes place in RowAggStorage::finalize.
This patch generalizes rowHashToIdx to leverage it in both cases
mentioned above.
in aggregation code
The patch disables padding that forces hasher to calculate over the whole 2k buffer. This patch also moves hashing code
into the common place where it belongs.
exact functionality that does not use MDB hash function.
This patch also takes a bit from Robin Hood hash map implementation forgotten
that reduces hash function collision rate.
* Introduce multigeneration aggregation
* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)
* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization
* Try to limit the qty of buckets at a low limit
* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory
* do not dump more than half of rowgroups to disk if generations are
allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
so the generations start more evenly
* Unify temp data location
* Explicitly create temp subdirectories
whether disk aggregation/join are enabled or not