mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00

Author	SHA1	Message	Date
Leonid Fedorov	432d0cf7f8	fix(build): replace std::ranges usage for gcc9 and std::span with boost::span	2025-04-15 17:58:11 +04:00
Aleksei Antipovskii	aa5f4fc5e7	fix(aggregation): fix dumping RGDatas to disk `amount` parameter of `RGData::serialize` is the raw size of `rowData` buffer without StringStore/UserDataStore/etc	2025-04-11 15:21:07 +02:00
Aleksei Antipovskii	21ebd1ac20	feat(bytestream): serialize long strings in the common way	2025-04-11 15:21:07 +02:00
Aleksei Antipovskii	4bea7e59a0	feat(PrimProc): MCOL-5852 disk-based GROUP_CONCAT & JSON_ARRAYAGG * move GROUP_CONCAT/JSON_ARRAYAGG storage to the RowGroup from the RowAggregation* * internal data structures (de)serialization * get rid of a specialized classes for processing JSON_ARRAYAGG * move the memory accounting to disk-based aggregation classes * allow aggregation generations to be used for queries with GROUP_CONCAT/JSON_ARRAYAGG * Remove the thread id from the error message as it interferes with the mtr	2025-04-11 15:21:07 +02:00
Aleksei Antipovskii	87d47fd7ae	fix(PrimProc): MCOL-5852 use only long string storage for the group_concat data to reduce memory usage	2025-04-11 15:21:07 +02:00
Akhmad Oripov	a6ab9bd615	fix(funcexp): MCOL-5386 Bitwise aggregation functions do not work with wide decimals (internal error) (#3485 )	2025-04-08 16:47:47 +01:00
drrtuy	04b44a835e	fix(rowgroup): fix for the forgotten attributes assignment	2025-03-27 22:12:48 +00:00
drrtuy	b649af5a0c	chore(): merge cleanup	2025-03-27 22:12:48 +00:00
drrtuy	b14613a66b	fix(aggregation): replaced instances with references	2025-03-27 22:12:48 +00:00
drrtuy	be5711cf0d	feat(): replace getMaxDataSize with getMaxDataSizeWithStrings to accurately account for mem	2025-03-27 22:12:48 +00:00
drrtuy	a4c4d33ee7	feat(): zerocopy TNS case and JOIN results RGData with CountingAllocator	2025-03-27 22:12:48 +00:00
drrtuy	3dfc8cd454	feat(): first cleanup	2025-03-27 22:12:48 +00:00
drrtuy	a6de8ec1ac	feat(): dangling pointer/ref issue has been solved for both RGData and BS	2025-03-27 22:12:48 +00:00
drrtuy	71ed9cabe0	feat(): propagate long strings SP type change	2025-03-27 22:12:48 +00:00
drrtuy	4e86123a5a	feat(): use boost::make_shared b/c most distros can't do allocate_shared for array types.	2025-03-27 22:12:48 +00:00
drrtuy	5f1bd3be12	feat(RGData,StringStore): add counting allocator capabilities to those ctors used in BPP::execute()	2025-03-27 22:12:48 +00:00
Aleksei Antipovskii	0ab03c7258	chore(codestyle): mark virtual methods as override	2025-02-21 20:01:34 +04:00
Sergey Zefirov	60dc7550f1	fix(group by, having): MCOL-5776: GROUP BY/HAVING closer to server's (#3371 ) This patch introduces an internal aggregate operator SELECT_SOME that is automatically added to columns that are not in GROUP BY. It "computes" some plausible value of the column (actually, last one passed). Along the way it fixes incorrect handling of HAVING being transferred into WHERE, window function handling and a bit of other inconsistencies.	2024-12-20 19:11:47 +00:00
Aleksei Antipovskii	e0a01c6cf4	Reapply "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 )" This reverts commit `a5c12b98d7`.	2024-12-11 12:02:24 +00:00
drrtuy	0a71892d97	feat(rowgroup): this returns bits lost during cherry-pick. The bits lost caused the first RGData::serialize to crash a process	2024-11-08 16:28:51 +04:00
drrtuy	dc03621e9d	fix(rowgroup): RGData now uses uint64_t counter for the fixed sizes columns data buf. The buffer can utilize > 4GB RAM that is necessary for PM side join. RGData ctor uses uint32_t allocating data buffer. This fact causes implicit heap overflow.	2024-11-08 16:28:51 +04:00
Roman Nozdrin	a5c12b98d7	Revert "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 )" This reverts commit `c7caa4374f`.	2024-07-07 13:09:56 +00:00
Sergey Zefirov	db4cb1d657	MCOL-4234 and MCOL 5772 cherry-picked into [stable 23.10] (#3226 ) * MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194) This patch fixes the problem in MCOL-4234 and also generally improves behavior of GROUP BY. It does so by introducing a "dummy" aggregate and by wrapping columns into it. This allows for columns that are not in GROUP BY to be used more freely, for example, in SELECT * FROM tbl GROUP BY col - all columns that are not "col" will be wrapped into an aggregate and query will proceed to execution. The dummy aggregate itself does nothing more than remember last value passed into it. There also an additional error message that tries to explain what types of expressions can be wrapped into an aggregate. * MCOL-5772: incorrect ORDER BY ordering for a columns not in GROUP BY (#3214) When ORDER BY column is not in GROUP BY, is not an aggregate and there is a SELECT column that is also not an aggregate, there was a problem: ordering happened on the SELECTed column, not ORDERed one. This patch fixes that particular problem and also performs some tidying around newly added aggregate. --------- Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com>	2024-06-28 00:31:53 +04:00
Denis Khalikov	9f4231f87f	MCOL-5708 Calculate precision and scale for constant decimal. (#3227 ) This patch calculates precision and scale for constant decimal value for SUM aggregation function.	2024-06-28 00:31:03 +04:00
drrtuy	c7caa4374f	fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 ) * fix(aggregation, disk-based): MCOL-5689 this fixes disk-based distinct aggregation functions Previously disk-based distinct aggregation functions produced incorrect results b/c there was no finalization applied for previous generations stored on disk. * fix(aggregation, disk-based): Fix disk-based COUNT(DISTINCT ...) queries. (Case 2). (Distinct & Multi-Distinct, Single- & Multi-Threaded). * fix(aggregation, disk-based): Fix disk-based DISTINCT & GROUP BY queries. (Case 1). (Distinct & Multi-Distinct, Single- & Multi-Threaded). --------- Co-authored-by: Theresa Hradilak <theresa.hradilak@gmail.com> Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>	2024-03-24 18:04:37 +03:00
Leonid Fedorov	0c6876d8e4	perf(primproc) MCOL-5601: Initilize two fields once in ctor instead of calling makeConfig std::string fTmpDir = config::Config::makeConfig()->getTempFileDir(config::Config::TempDirPurpose::Aggregates); std::string fCompStr = config::Config::makeConfig()->getConfig("RowAggregation", "Compression");	2023-12-21 18:19:17 +03:00
drrtuy	63b032e3fd	fix(rowstorage): SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock (#3050 )	2023-12-05 18:30:31 +03:00
Leonid Fedorov	86c1c5d537	fix(rgdata)!: Fix assertion failure leading to disk-based aggregation failure The new added invariant checking that RGData knows the number of columns and fixed size columns was failing for disk-based aggregation workloads, leading them to provide a wrong result. (The assertion failure happened in RGData::getRow(uint32_t num, Row* row) which is called in the finalization of sub-aggregation results, necessary for merging part results. As the merging failed, duplicate results were output for disk-based aggregation queries. The assertion failure was caused by RGData::deserialize(ByteStream& bs, uint32_t defAmount) not setting rowSize and colCount if necessary (e.g. when the deserialization happens into a new, default RGData, which doesn't know anything about its structure yet. This is the case when the default constructor for RGData() is used, which sets rowSize and columnCount to 0 each. There are three code parts that make use of the default RGData() ctor. The fix is for the use in RowGroupStorage::loadRG(uint64_t rgid, std::unique_ptr<RGData>& rgdata, bool unlinkDump = false), where the default RGData object is used to directly deserialize a ByteStream into it. The deserialize method now checks if both rowSize and columnCount are 0 and if yes sets the read values from the ByteStream for both. We should probably check the other two code parts making use of the default RGData ctor, too. This happens in joinpartition.cpp and tuplejoiner.cpp. --------- Co-authored-by: Theresa Hradilak <34538290+phoeinx@users.noreply.github.com>	2023-09-30 00:02:31 +03:00
Sergey Zefirov	920607520c	feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support Adds a special column which helps to differentiate data and rollups of various depts and a simple logic to row aggregation to add processing of subtotals.	2023-09-26 17:01:53 +03:00
Leonid Fedorov	8f93fc3623	MCOL-5493: First portion of UBSan fixes (#2842 ) Multiple UB fixes	2023-06-02 17:02:09 +03:00
Gagan Goel	0be1c3dc8f	MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. 1. Input and output RowGroup's used in GROUP_CONCAT classes are currently allocating a raw memory buffer of size equal to the actual width of the string datatype. As an example, for the following query: SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1; If col2 is a TEXT field with default width, the input RowGroup containing the target rows to be concatenated will assign 64kb of memory for every input row in the RowGroup. This is wasteful as actual field values in real workloads would be much smaller. We fix this by enabling the RowGroup to use the StringStore when the RowGroup contains long strings. 2. RowAggregation::initialize() allocates a memory buffer for a NULL row. The size of this buffer is equal to the row size for the output RowGroup. For the above scenario, using the default group_concat_max_len (which is a server variable that sets the maximum length of the GROUP_CONCAT string) value of 1mb, the buffer size would be (1mb + 64kb + some additional metadata). If the user sets group_concat_max_len to a higher value, say 3gb, this buffer size would be ~3gb. Now if the runtime initiates several instances of RowAggregation, total memory consumption by PrimProc could exceed the hardware memory limits causing the OS OOM to kill the process. We fix this problem by again enabling the StringStore for the NULL row allocation. 3. In the plugin code in buildAggregateColumn(), there is an integer overflow when the server group_concat_max_len variable (which is an uint32_t) is set to a value > INT32_MAX (such as 3gb) and is assigned to CalpontSystemCatalog::ColType::colWidth (which is an int32_t). As a short term fix, we saturate the assigned value to colWidth to INT32_MAX. Proper fix would be to upgrade CalpontSystemCatalog::ColType::colWidth to an uint32_t.	2023-05-01 13:06:23 -04:00
Roman Nozdrin	4fe9cd64a3	Revert "No boost condition (#2822 )" (#2828 ) This reverts commit `f916e64927`.	2023-04-22 15:49:50 +03:00
Leonid Fedorov	f916e64927	No boost condition (#2822 ) This patch replaces boost primitives with stdlib counterparts.	2023-04-22 00:42:45 +03:00
Leonid Fedorov	c2d0fa24da	replace boost::shared_array<T> to std::shared_ptr<T[]>	2023-04-14 10:33:27 +00:00
Leonid Fedorov	a508b86091	remove boost/shared_array include	2023-04-14 09:42:50 +00:00
Leonid Fedorov	6c32c658d5	MCOL-5385: Delete RowGroup::setData and make Pointer ctor explicit (#2808 ) * Delete RowGroup::setData and make Pointer ctor explicit * some push_backs replaced with emplace_backs * Fixes of review notes	2023-04-13 03:55:30 +03:00
Leonid Fedorov	2e1394149b	MCOL-5464: Fixes of bugs from ASAN warnings, part one (#2792 ) * Fixes of bugs from ASAN warnings, part one * MQC as static library, with nifty counter for global map and mutex * Switch clang to 16 * link messageqcpp to execplan	2023-04-04 02:33:23 +03:00
Sergey Zefirov	b53c231ca6	MCOL-271 empty strings should not be NULLs (#2794 ) This patch improves handling of NULLs in textual fields in ColumnStore. Previously empty strings were considered NULLs and it could be a problem if data scheme allows for empty strings. It was also one of major reasons of behavior difference between ColumnStore and other engines in MariaDB family. Also, this patch fixes some other bugs and incorrect behavior, for example, incorrect comparison for "column <= ''" which evaluates to constant True for all purposes before this patch.	2023-03-30 21:18:29 +03:00
Roman Nozdrin	688b47d4e7	MCOL-5451 This resolves external GROUP BY result inconsistency issues Given that idx is a RH hashmap bucket number and info is intra-bucket idx the root cause is triggered by the difference of idx/hash pair calculation for a certain GROUP BY generation and for generation aggregations merging that takes place in RowAggStorage::finalize. This patch generalizes rowHashToIdx to leverage it in both cases mentioned above.	2023-03-25 15:04:16 +00:00
Roman Nozdrin	786b9da5b0	MCOL-5438 COUNT() in math causes SEGV	2023-03-09 20:35:38 +00:00
Leonid Fedorov	56f2346083	Remove windows ifdefs	2023-03-02 15:59:42 +00:00
Leonid Fedorov	123c345b40	remove winport	2023-03-02 15:37:11 +00:00
david.hall	8642231666	Changes to compile local	2022-11-17 11:29:21 -06:00
mariadb-AndreyPiskunov	b57d2c30fe	Minor fixes	2022-10-31 14:56:32 +02:00
mariadb-AndreyPiskunov	1714b75434	Non working attempt to do MCOL-5227	2022-10-31 14:56:32 +02:00
Alexey Antipovsky	440101dfff	[MCOL-5213] Fix a rare IO error	2022-09-14 17:12:15 +03:00
Roman Nozdrin	568ac5ba7b	Merge pull request #2535 from mariadb-corporation/int128Fields Int128 fields	2022-08-28 17:42:15 +01:00
Leonid Fedorov	d2432f9bf6	get rid of pointers for 128 fields	2022-08-26 15:12:22 +00:00
mariadb-AndreyPiskunov	0863ecd279	Replace getBinaryField	2022-08-25 18:21:43 +03:00
Roman Nozdrin	72e264e8ef	MCOL-5199 This patch solves the overal performance degradation introduced with a new way of char columns hashing in aggregation code The patch disables padding that forces hasher to calculate over the whole 2k buffer. This patch also moves hashing code into the common place where it belongs.	2022-08-24 19:07:06 +00:00

1 2 3 4 5 ...

256 Commits