mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00

Author	SHA1	Message	Date
Leonid Fedorov	449029a827	Deep build refactoring phase 2 (#3564 ) * configcpp refactored * chore(build): massive removals, auto add files to debian install file * chore(build): configure before autobake * chore(build): use custom cmake commands for components, mariadb-plugin-columnstore.install generated * chore(build): install deps as separate step for build-packages * more deps * chore(codemanagement, build): build refactoring stage2 * chore(safety): Locked Map for MessageqCpp with a simpler way Please enter the commit message for your changes. Lines starting * chore(codemanagement, ci): better coredumps handling, deps fixed * Delete build/bootstrap_mcs.py * Update charset.cpp (add license)	2025-07-17 16:14:10 +04:00
Leonid Fedorov	aa7e0fb9b4	Deep build refactoring phase 1 (#3562 ) * configcpp refactored * logging and datatypes refactored * more dataconvert * chore(build): massive removals, auto add files to debian install file * chore(codemanagement): nodeps headers, potentioal library * chore(build): configure before autobake * chore(build): use custom cmake commands for components, mariadb-plugin-columnstore.install generated * chore(build): install deps as separate step for build-packages * more deps * check debian/mariadb-plugin-columnstore.install automatically * chore(build): add option for multibracnh compilation * Fix warning	2025-05-30 14:05:21 +04:00
Leonid Fedorov	842ec9dbff	chore(build): fix duplicating hasRollup	2025-05-23 05:12:17 +04:00
Serguey Zefirov	3bb2496ca1	fix: MCOL-5755: incorrect handling of BLOB (and TEXT) in GROUP BY BLOB fields did not work as grouping keys at all, they were assigned value NULL for any value, be it NULL or not. The fix is in the rowaggregation.cpp in the initMapping(), a switch/case branch was added to handle BLOB field copying there. Also, TEXT columns did not distinguish between NULL and empty string in the grouping algorithm, now they do. The fix is in the equals() function, now we specifically check for isNull() equality between values.	2025-05-23 05:12:17 +04:00
Sergey Zefirov	11324c468d	feat(primproc,aggregegation)!: Changes for ROLLUP with single-phase aggregation (#3025 ) The fix is simple: enable subtotals in single-phase aggregation and disable parallel processing when there are subtotals and aggregation is single-phase.	2025-05-23 05:12:17 +04:00
Leonid Fedorov	bab4578118	bug(memory): add uint128_t specializations for Row::setBinaryField_offset and Row::setBinaryField as we have segfault woth ubsan build	2025-05-21 21:59:08 +04:00
Alexey Antipovsky	bd0f59910a	feat(PrimProc): MCOL-5950 Improve disk-based aggregation finalization (#3525 ) * feat(PrimProc): MCOL-5950 Improve disk-based aggregation finalization Iterate over the rows in the plain vector of RGData instead of iterating over the hashmap. This reduces the complexity and speeds up finalization (by up to the twice in the certain cases) * replace magic constant with muggle constant	2025-05-21 10:53:48 +01:00
Leonid Fedorov	d09b7496e4	Revert "fix(MCOL-5386): Bitwise aggregation functions do not work with wide d…" This reverts commit `e0f3bf8322`.	2025-05-20 19:17:05 +04:00
Leonid Fedorov	6db2dc668f	stubs and cmake formatting	2025-05-20 18:22:59 +04:00
Leonid Fedorov	2036e521c7	named linkage	2025-05-20 18:22:59 +04:00
Leonid Fedorov	a1019b7c0e	chore(build): refactor main CMakeLists.txt (#3543 ) * chore(build): refactor main CMakeLists.txt * chore(build): fix boost version for packages, set clang-20 only for amd and arm * chore(build): boost 4 sm * chore(build): boost dep for rowgroup * chore(build): toolset for boost * chore(build): suppress clang warnings for boost * chore(ci, build): use ASAN for unittest on ubuntu 24.04 only, added custom cmake flag option for bootstrap, custom params for new and existing pipelines * chore(build): sort bootstrap flags * chore(CI): remove publish pkg step, adding clickable link instead to publish steps, fix customenv	2025-05-20 05:00:48 +04:00
Akhmad O.	e0f3bf8322	fix(MCOL-5386): Bitwise aggregation functions do not work with wide decimals (updated previous PR) (#3522 ) * fix(MCOL-5386): Bitwise aggregation functions do not work with wide decimals (updated previous PR) * MCOL-5386: Added test for Decimal(18)	2025-05-19 20:41:28 +01:00
Leonid Fedorov	a0bee173f6	chore(build): fixes to satisfy clang19 warnings	2025-05-15 19:05:38 +04:00
Leonid Fedorov	432d0cf7f8	fix(build): replace std::ranges usage for gcc9 and std::span with boost::span	2025-04-15 17:58:11 +04:00
Aleksei Antipovskii	aa5f4fc5e7	fix(aggregation): fix dumping RGDatas to disk `amount` parameter of `RGData::serialize` is the raw size of `rowData` buffer without StringStore/UserDataStore/etc	2025-04-11 15:21:07 +02:00
Aleksei Antipovskii	21ebd1ac20	feat(bytestream): serialize long strings in the common way	2025-04-11 15:21:07 +02:00
Aleksei Antipovskii	4bea7e59a0	feat(PrimProc): MCOL-5852 disk-based GROUP_CONCAT & JSON_ARRAYAGG * move GROUP_CONCAT/JSON_ARRAYAGG storage to the RowGroup from the RowAggregation* * internal data structures (de)serialization * get rid of a specialized classes for processing JSON_ARRAYAGG * move the memory accounting to disk-based aggregation classes * allow aggregation generations to be used for queries with GROUP_CONCAT/JSON_ARRAYAGG * Remove the thread id from the error message as it interferes with the mtr	2025-04-11 15:21:07 +02:00
Aleksei Antipovskii	87d47fd7ae	fix(PrimProc): MCOL-5852 use only long string storage for the group_concat data to reduce memory usage	2025-04-11 15:21:07 +02:00
Akhmad Oripov	a6ab9bd615	fix(funcexp): MCOL-5386 Bitwise aggregation functions do not work with wide decimals (internal error) (#3485 )	2025-04-08 16:47:47 +01:00
drrtuy	04b44a835e	fix(rowgroup): fix for the forgotten attributes assignment	2025-03-27 22:12:48 +00:00
drrtuy	b649af5a0c	chore(): merge cleanup	2025-03-27 22:12:48 +00:00
drrtuy	b14613a66b	fix(aggregation): replaced instances with references	2025-03-27 22:12:48 +00:00
drrtuy	be5711cf0d	feat(): replace getMaxDataSize with getMaxDataSizeWithStrings to accurately account for mem	2025-03-27 22:12:48 +00:00
drrtuy	a4c4d33ee7	feat(): zerocopy TNS case and JOIN results RGData with CountingAllocator	2025-03-27 22:12:48 +00:00
drrtuy	3dfc8cd454	feat(): first cleanup	2025-03-27 22:12:48 +00:00
drrtuy	a6de8ec1ac	feat(): dangling pointer/ref issue has been solved for both RGData and BS	2025-03-27 22:12:48 +00:00
drrtuy	71ed9cabe0	feat(): propagate long strings SP type change	2025-03-27 22:12:48 +00:00
drrtuy	4e86123a5a	feat(): use boost::make_shared b/c most distros can't do allocate_shared for array types.	2025-03-27 22:12:48 +00:00
drrtuy	5f1bd3be12	feat(RGData,StringStore): add counting allocator capabilities to those ctors used in BPP::execute()	2025-03-27 22:12:48 +00:00
Aleksei Antipovskii	0ab03c7258	chore(codestyle): mark virtual methods as override	2025-02-21 20:01:34 +04:00
Sergey Zefirov	60dc7550f1	fix(group by, having): MCOL-5776: GROUP BY/HAVING closer to server's (#3371 ) This patch introduces an internal aggregate operator SELECT_SOME that is automatically added to columns that are not in GROUP BY. It "computes" some plausible value of the column (actually, last one passed). Along the way it fixes incorrect handling of HAVING being transferred into WHERE, window function handling and a bit of other inconsistencies.	2024-12-20 19:11:47 +00:00
Aleksei Antipovskii	e0a01c6cf4	Reapply "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 )" This reverts commit `a5c12b98d7`.	2024-12-11 12:02:24 +00:00
drrtuy	0a71892d97	feat(rowgroup): this returns bits lost during cherry-pick. The bits lost caused the first RGData::serialize to crash a process	2024-11-08 16:28:51 +04:00
drrtuy	dc03621e9d	fix(rowgroup): RGData now uses uint64_t counter for the fixed sizes columns data buf. The buffer can utilize > 4GB RAM that is necessary for PM side join. RGData ctor uses uint32_t allocating data buffer. This fact causes implicit heap overflow.	2024-11-08 16:28:51 +04:00
Roman Nozdrin	a5c12b98d7	Revert "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 )" This reverts commit `c7caa4374f`.	2024-07-07 13:09:56 +00:00
Sergey Zefirov	db4cb1d657	MCOL-4234 and MCOL 5772 cherry-picked into [stable 23.10] (#3226 ) * MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194) This patch fixes the problem in MCOL-4234 and also generally improves behavior of GROUP BY. It does so by introducing a "dummy" aggregate and by wrapping columns into it. This allows for columns that are not in GROUP BY to be used more freely, for example, in SELECT * FROM tbl GROUP BY col - all columns that are not "col" will be wrapped into an aggregate and query will proceed to execution. The dummy aggregate itself does nothing more than remember last value passed into it. There also an additional error message that tries to explain what types of expressions can be wrapped into an aggregate. * MCOL-5772: incorrect ORDER BY ordering for a columns not in GROUP BY (#3214) When ORDER BY column is not in GROUP BY, is not an aggregate and there is a SELECT column that is also not an aggregate, there was a problem: ordering happened on the SELECTed column, not ORDERed one. This patch fixes that particular problem and also performs some tidying around newly added aggregate. --------- Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com>	2024-06-28 00:31:53 +04:00
Denis Khalikov	9f4231f87f	MCOL-5708 Calculate precision and scale for constant decimal. (#3227 ) This patch calculates precision and scale for constant decimal value for SUM aggregation function.	2024-06-28 00:31:03 +04:00
drrtuy	c7caa4374f	fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 ) * fix(aggregation, disk-based): MCOL-5689 this fixes disk-based distinct aggregation functions Previously disk-based distinct aggregation functions produced incorrect results b/c there was no finalization applied for previous generations stored on disk. * fix(aggregation, disk-based): Fix disk-based COUNT(DISTINCT ...) queries. (Case 2). (Distinct & Multi-Distinct, Single- & Multi-Threaded). * fix(aggregation, disk-based): Fix disk-based DISTINCT & GROUP BY queries. (Case 1). (Distinct & Multi-Distinct, Single- & Multi-Threaded). --------- Co-authored-by: Theresa Hradilak <theresa.hradilak@gmail.com> Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>	2024-03-24 18:04:37 +03:00
Leonid Fedorov	0c6876d8e4	perf(primproc) MCOL-5601: Initilize two fields once in ctor instead of calling makeConfig std::string fTmpDir = config::Config::makeConfig()->getTempFileDir(config::Config::TempDirPurpose::Aggregates); std::string fCompStr = config::Config::makeConfig()->getConfig("RowAggregation", "Compression");	2023-12-21 18:19:17 +03:00
drrtuy	63b032e3fd	fix(rowstorage): SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock (#3050 )	2023-12-05 18:30:31 +03:00
Leonid Fedorov	86c1c5d537	fix(rgdata)!: Fix assertion failure leading to disk-based aggregation failure The new added invariant checking that RGData knows the number of columns and fixed size columns was failing for disk-based aggregation workloads, leading them to provide a wrong result. (The assertion failure happened in RGData::getRow(uint32_t num, Row* row) which is called in the finalization of sub-aggregation results, necessary for merging part results. As the merging failed, duplicate results were output for disk-based aggregation queries. The assertion failure was caused by RGData::deserialize(ByteStream& bs, uint32_t defAmount) not setting rowSize and colCount if necessary (e.g. when the deserialization happens into a new, default RGData, which doesn't know anything about its structure yet. This is the case when the default constructor for RGData() is used, which sets rowSize and columnCount to 0 each. There are three code parts that make use of the default RGData() ctor. The fix is for the use in RowGroupStorage::loadRG(uint64_t rgid, std::unique_ptr<RGData>& rgdata, bool unlinkDump = false), where the default RGData object is used to directly deserialize a ByteStream into it. The deserialize method now checks if both rowSize and columnCount are 0 and if yes sets the read values from the ByteStream for both. We should probably check the other two code parts making use of the default RGData ctor, too. This happens in joinpartition.cpp and tuplejoiner.cpp. --------- Co-authored-by: Theresa Hradilak <34538290+phoeinx@users.noreply.github.com>	2023-09-30 00:02:31 +03:00
Sergey Zefirov	920607520c	feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support Adds a special column which helps to differentiate data and rollups of various depts and a simple logic to row aggregation to add processing of subtotals.	2023-09-26 17:01:53 +03:00
Leonid Fedorov	8f93fc3623	MCOL-5493: First portion of UBSan fixes (#2842 ) Multiple UB fixes	2023-06-02 17:02:09 +03:00
Gagan Goel	0be1c3dc8f	MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. 1. Input and output RowGroup's used in GROUP_CONCAT classes are currently allocating a raw memory buffer of size equal to the actual width of the string datatype. As an example, for the following query: SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1; If col2 is a TEXT field with default width, the input RowGroup containing the target rows to be concatenated will assign 64kb of memory for every input row in the RowGroup. This is wasteful as actual field values in real workloads would be much smaller. We fix this by enabling the RowGroup to use the StringStore when the RowGroup contains long strings. 2. RowAggregation::initialize() allocates a memory buffer for a NULL row. The size of this buffer is equal to the row size for the output RowGroup. For the above scenario, using the default group_concat_max_len (which is a server variable that sets the maximum length of the GROUP_CONCAT string) value of 1mb, the buffer size would be (1mb + 64kb + some additional metadata). If the user sets group_concat_max_len to a higher value, say 3gb, this buffer size would be ~3gb. Now if the runtime initiates several instances of RowAggregation, total memory consumption by PrimProc could exceed the hardware memory limits causing the OS OOM to kill the process. We fix this problem by again enabling the StringStore for the NULL row allocation. 3. In the plugin code in buildAggregateColumn(), there is an integer overflow when the server group_concat_max_len variable (which is an uint32_t) is set to a value > INT32_MAX (such as 3gb) and is assigned to CalpontSystemCatalog::ColType::colWidth (which is an int32_t). As a short term fix, we saturate the assigned value to colWidth to INT32_MAX. Proper fix would be to upgrade CalpontSystemCatalog::ColType::colWidth to an uint32_t.	2023-05-01 13:06:23 -04:00
Roman Nozdrin	4fe9cd64a3	Revert "No boost condition (#2822 )" (#2828 ) This reverts commit `f916e64927`.	2023-04-22 15:49:50 +03:00
Leonid Fedorov	f916e64927	No boost condition (#2822 ) This patch replaces boost primitives with stdlib counterparts.	2023-04-22 00:42:45 +03:00
Leonid Fedorov	c2d0fa24da	replace boost::shared_array<T> to std::shared_ptr<T[]>	2023-04-14 10:33:27 +00:00
Leonid Fedorov	a508b86091	remove boost/shared_array include	2023-04-14 09:42:50 +00:00
Leonid Fedorov	6c32c658d5	MCOL-5385: Delete RowGroup::setData and make Pointer ctor explicit (#2808 ) * Delete RowGroup::setData and make Pointer ctor explicit * some push_backs replaced with emplace_backs * Fixes of review notes	2023-04-13 03:55:30 +03:00
Leonid Fedorov	2e1394149b	MCOL-5464: Fixes of bugs from ASAN warnings, part one (#2792 ) * Fixes of bugs from ASAN warnings, part one * MQC as static library, with nifty counter for global map and mutex * Switch clang to 16 * link messageqcpp to execplan	2023-04-04 02:33:23 +03:00

1 2 3 4 5 ...

269 Commits