* fix(rowgroup): RGData now uses uint64_t counter for the fixed sizes columns data buf.
The buffer can utilize > 4GB RAM that is necessary for PM side join.
RGData ctor uses uint32_t allocating data buffer.
This fact causes implicit heap overflow.
* feat(bytestream,serdes): BS buffer size type is uint64_t
This necessary to handle 64bit RGData, that comes as
a separate patch. The pair of patches would allow to
have PM joins when SmallSide size > 4GB.
* feat(bytestream,serdes): Distribute BS buf size data type change to avoid implicit data type narrowing
* feat(rowgroup): this returns bits lost during cherry-pick. The bits lost caused the first RGData::serialize to crash a process
Given that idx is a RH hashmap bucket number and info is intra-bucket idx
the root cause is triggered by the difference of idx/hash pair
calculation for a certain GROUP BY generation and for generation
aggregations merging that takes place in RowAggStorage::finalize.
This patch generalizes rowHashToIdx to leverage it in both cases
mentioned above.
in aggregation code
The patch disables padding that forces hasher to calculate over the whole 2k buffer. This patch also moves hashing code
into the common place where it belongs.
exact functionality that does not use MDB hash function.
This patch also takes a bit from Robin Hood hash map implementation forgotten
that reduces hash function collision rate.
* Introduce multigeneration aggregation
* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)
* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization
* Try to limit the qty of buckets at a low limit
* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory
* do not dump more than half of rowgroups to disk if generations are
allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
so the generations start more evenly
* Unify temp data location
* Explicitly create temp subdirectories
whether disk aggregation/join are enabled or not