1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00
Commit Graph

15 Commits

Author SHA1 Message Date
bd0f59910a feat(PrimProc): MCOL-5950 Improve disk-based aggregation finalization (#3525)
* feat(PrimProc): MCOL-5950 Improve disk-based aggregation finalization

    Iterate over the rows in the plain vector of RGData instead of
    iterating over the hashmap. This reduces the complexity and speeds
    up finalization (by up to the twice in the certain cases)

* replace magic constant with muggle constant
2025-05-21 10:53:48 +01:00
e0a01c6cf4 Reapply "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145)"
This reverts commit a5c12b98d7.
2024-12-11 12:02:24 +00:00
a5c12b98d7 Revert "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145)"
This reverts commit c7caa4374f.
2024-07-07 13:09:56 +00:00
c7caa4374f fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145)
* fix(aggregation, disk-based): MCOL-5689 this fixes disk-based distinct aggregation functions
Previously disk-based distinct aggregation functions produced incorrect results b/c there was no finalization applied for previous generations stored on disk.

*  fix(aggregation, disk-based): Fix disk-based COUNT(DISTINCT ...) queries. (Case 2). (Distinct & Multi-Distinct, Single- & Multi-Threaded).

* fix(aggregation, disk-based): Fix disk-based DISTINCT & GROUP BY queries. (Case 1). (Distinct & Multi-Distinct, Single- & Multi-Threaded).

---------

Co-authored-by: Theresa Hradilak <theresa.hradilak@gmail.com>
Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>
2024-03-24 18:04:37 +03:00
63b032e3fd fix(rowstorage): SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock (#3050) 2023-12-05 18:30:31 +03:00
688b47d4e7 MCOL-5451 This resolves external GROUP BY result inconsistency issues
Given that idx is a RH hashmap bucket number and info is intra-bucket idx
    the root cause is triggered by the difference of idx/hash pair
    calculation for a certain GROUP BY generation and for generation
    aggregations merging that takes place in RowAggStorage::finalize.
    This patch generalizes rowHashToIdx to leverage it in both cases
    mentioned above.
2023-03-25 15:04:16 +00:00
15ce531270 Randomly start a new generation if the free memory is less than 30% 2022-08-24 14:00:37 +00:00
dd96e686c0 MCOL-5153 This patch replaces MDB collation aware hash function with the (#2488)
exact functionality that does not use MDB hash function.
This patch also takes a bit from Robin Hood hash map implementation forgotten
that reduces hash function collision rate.
2022-08-07 02:36:03 +03:00
6b17c358c0 MCOL-5153 This increases the size of the multiplier in the guarding check in RowAggStorage::increaseSize() so that it doesn't throw w/o a reason (#2463) 2022-07-22 10:19:36 -05:00
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
01f3ceb437 replace header guards with #pragma once 2022-01-21 15:24:58 +00:00
6a4140394d [MCOL-4829] More accurate memory counting 2021-09-07 19:52:20 +03:00
7fea3c988e [MCOL-4829] Compression for the temp disk-based aggregation files 2021-09-02 19:30:25 +03:00
5c5f103f98 MCOL-4839: Fix clang build (#2100)
* Fix clang build

* Extern C returned to plugin_instance

Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>
2021-08-23 10:45:10 -05:00
475104e4d3 [MCOL-4709] Disk-based aggregation
* Introduce multigeneration aggregation

* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)

* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization

* Try to limit the qty of buckets at a low limit

* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory

* do not dump more than half of rowgroups to disk if generations are
  allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
  so the generations start more evenly

* Unify temp data location

* Explicitly create temp subdirectories
  whether disk aggregation/join are enabled or not
2021-06-06 16:09:15 +03:00