mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-30 19:23:07 +03:00

Author	SHA1	Message	Date
Alexey Antipovsky	bd0f59910a	feat(PrimProc): MCOL-5950 Improve disk-based aggregation finalization (#3525 ) * feat(PrimProc): MCOL-5950 Improve disk-based aggregation finalization Iterate over the rows in the plain vector of RGData instead of iterating over the hashmap. This reduces the complexity and speeds up finalization (by up to the twice in the certain cases) * replace magic constant with muggle constant	2025-05-21 10:53:48 +01:00
Aleksei Antipovskii	aa5f4fc5e7	fix(aggregation): fix dumping RGDatas to disk `amount` parameter of `RGData::serialize` is the raw size of `rowData` buffer without StringStore/UserDataStore/etc	2025-04-11 15:21:07 +02:00
Aleksei Antipovskii	4bea7e59a0	feat(PrimProc): MCOL-5852 disk-based GROUP_CONCAT & JSON_ARRAYAGG * move GROUP_CONCAT/JSON_ARRAYAGG storage to the RowGroup from the RowAggregation* * internal data structures (de)serialization * get rid of a specialized classes for processing JSON_ARRAYAGG * move the memory accounting to disk-based aggregation classes * allow aggregation generations to be used for queries with GROUP_CONCAT/JSON_ARRAYAGG * Remove the thread id from the error message as it interferes with the mtr	2025-04-11 15:21:07 +02:00
drrtuy	be5711cf0d	feat(): replace getMaxDataSize with getMaxDataSizeWithStrings to accurately account for mem	2025-03-27 22:12:48 +00:00
Aleksei Antipovskii	e0a01c6cf4	Reapply "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 )" This reverts commit `a5c12b98d7`.	2024-12-11 12:02:24 +00:00
drrtuy	dc03621e9d	fix(rowgroup): RGData now uses uint64_t counter for the fixed sizes columns data buf. The buffer can utilize > 4GB RAM that is necessary for PM side join. RGData ctor uses uint32_t allocating data buffer. This fact causes implicit heap overflow.	2024-11-08 16:28:51 +04:00
Roman Nozdrin	a5c12b98d7	Revert "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 )" This reverts commit `c7caa4374f`.	2024-07-07 13:09:56 +00:00
drrtuy	c7caa4374f	fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 ) * fix(aggregation, disk-based): MCOL-5689 this fixes disk-based distinct aggregation functions Previously disk-based distinct aggregation functions produced incorrect results b/c there was no finalization applied for previous generations stored on disk. * fix(aggregation, disk-based): Fix disk-based COUNT(DISTINCT ...) queries. (Case 2). (Distinct & Multi-Distinct, Single- & Multi-Threaded). * fix(aggregation, disk-based): Fix disk-based DISTINCT & GROUP BY queries. (Case 1). (Distinct & Multi-Distinct, Single- & Multi-Threaded). --------- Co-authored-by: Theresa Hradilak <theresa.hradilak@gmail.com> Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>	2024-03-24 18:04:37 +03:00
drrtuy	63b032e3fd	fix(rowstorage): SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock (#3050 )	2023-12-05 18:30:31 +03:00
Roman Nozdrin	688b47d4e7	MCOL-5451 This resolves external GROUP BY result inconsistency issues Given that idx is a RH hashmap bucket number and info is intra-bucket idx the root cause is triggered by the difference of idx/hash pair calculation for a certain GROUP BY generation and for generation aggregations merging that takes place in RowAggStorage::finalize. This patch generalizes rowHashToIdx to leverage it in both cases mentioned above.	2023-03-25 15:04:16 +00:00
david.hall	8642231666	Changes to compile local	2022-11-17 11:29:21 -06:00
Alexey Antipovsky	440101dfff	[MCOL-5213] Fix a rare IO error	2022-09-14 17:12:15 +03:00
Roman Nozdrin	72e264e8ef	MCOL-5199 This patch solves the overal performance degradation introduced with a new way of char columns hashing in aggregation code The patch disables padding that forces hasher to calculate over the whole 2k buffer. This patch also moves hashing code into the common place where it belongs.	2022-08-24 19:07:06 +00:00
Roman Nozdrin	20f57b713a	MCOL-5198 This patch enables RowStorage to dump data on disk using startNewGeneration if there is 50 Megs left free	2022-08-24 14:00:43 +00:00
Alexey Antipovsky	15ce531270	Randomly start a new generation if the free memory is less than 30%	2022-08-24 14:00:37 +00:00
Alexey Antipovsky	dca359c2ab	Fix excessive memory consumption at the last stage of aggregation	2022-08-18 14:00:53 +03:00
Roman Nozdrin	dd96e686c0	MCOL-5153 This patch replaces MDB collation aware hash function with the (#2488 ) exact functionality that does not use MDB hash function. This patch also takes a bit from Robin Hood hash map implementation forgotten that reduces hash function collision rate.	2022-08-07 02:36:03 +03:00
Roman Nozdrin	6b17c358c0	MCOL-5153 This increases the size of the multiplier in the guarding check in RowAggStorage::increaseSize() so that it doesn't throw w/o a reason (#2463 )	2022-07-22 10:19:36 -05:00
David Hall	27dea733c5	MCOL4841 dev port run large join without OOM	2022-02-09 17:33:55 -06:00
Leonid Fedorov	04752ec546	clang format apply	2022-01-21 16:43:49 +00:00
Alexey Antipovsky	6a4140394d	[MCOL-4829] More accurate memory counting	2021-09-07 19:52:20 +03:00
Alexey Antipovsky	7fea3c988e	[MCOL-4829] Compression for the temp disk-based aggregation files	2021-09-02 19:30:25 +03:00
Alexey Antipovsky	60495564b8	[MCOL-4709] Fix another UB in disk aggregation	2021-06-29 17:47:07 +03:00
Alexey Antipovsky	8a0b68f25e	[MCOL-4709] Fix UB in disk aggregation	2021-06-28 20:07:23 +03:00
Alexey Antipovsky	475104e4d3	[MCOL-4709] Disk-based aggregation * Introduce multigeneration aggregation * Do not save unused part of RGDatas to disk * Add IO error explanation (strerror) * Reduce memory usage while aggregating * introduce in-memory generations to better memory utilization * Try to limit the qty of buckets at a low limit * Refactor disk aggregation a bit * pass calculated hash into RowAggregation * try to keep some RGData with free space in memory * do not dump more than half of rowgroups to disk if generations are allowed, instead start a new generation * for each thread shift the first processed bucket at each iteration, so the generations start more evenly * Unify temp data location * Explicitly create temp subdirectories whether disk aggregation/join are enabled or not	2021-06-06 16:09:15 +03:00

25 Commits