1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00
Commit Graph

269 Commits

Author SHA1 Message Date
449029a827 Deep build refactoring phase 2 (#3564)
* configcpp refactored

* chore(build): massive removals, auto add files to debian install file

* chore(build): configure before autobake

* chore(build): use custom cmake commands for components, mariadb-plugin-columnstore.install generated

* chore(build): install deps as separate step for build-packages

* more deps

* chore(codemanagement, build): build refactoring stage2

* chore(safety): Locked Map for MessageqCpp with a simpler way

 Please enter the commit message for your changes. Lines starting

* chore(codemanagement, ci): better coredumps handling, deps fixed

* Delete build/bootstrap_mcs.py

* Update charset.cpp (add license)
2025-07-17 16:14:10 +04:00
aa7e0fb9b4 Deep build refactoring phase 1 (#3562)
* configcpp refactored
* logging and datatypes refactored

* more dataconvert
* chore(build): massive removals, auto add files to debian install file
* chore(codemanagement): nodeps headers, potentioal library
* chore(build): configure before autobake
* chore(build): use custom cmake commands for components, mariadb-plugin-columnstore.install generated
* chore(build): install deps as separate step for build-packages
* more deps
* check  debian/mariadb-plugin-columnstore.install automatically
* chore(build): add option for multibracnh compilation
* Fix warning
2025-05-30 14:05:21 +04:00
842ec9dbff chore(build): fix duplicating hasRollup 2025-05-23 05:12:17 +04:00
3bb2496ca1 fix: MCOL-5755: incorrect handling of BLOB (and TEXT) in GROUP BY
BLOB fields did not work as grouping keys at all, they were assigned
value NULL for any value, be it NULL or not. The fix is in the
rowaggregation.cpp in the initMapping(), a switch/case branch was added
to handle BLOB field copying there.

Also, TEXT columns did not distinguish between NULL and empty string in
the grouping algorithm, now they do. The fix is in the equals()
function, now we specifically check for isNull() equality between
values.
2025-05-23 05:12:17 +04:00
11324c468d feat(primproc,aggregegation)!: Changes for ROLLUP with single-phase aggregation (#3025)
The fix is simple: enable subtotals in single-phase aggregation and
disable parallel processing when there are subtotals and aggregation is
single-phase.
2025-05-23 05:12:17 +04:00
bab4578118 bug(memory): add uint128_t specializations for Row::setBinaryField_offset and Row::setBinaryField as we have segfault woth ubsan build 2025-05-21 21:59:08 +04:00
bd0f59910a feat(PrimProc): MCOL-5950 Improve disk-based aggregation finalization (#3525)
* feat(PrimProc): MCOL-5950 Improve disk-based aggregation finalization

    Iterate over the rows in the plain vector of RGData instead of
    iterating over the hashmap. This reduces the complexity and speeds
    up finalization (by up to the twice in the certain cases)

* replace magic constant with muggle constant
2025-05-21 10:53:48 +01:00
d09b7496e4 Revert "fix(MCOL-5386): Bitwise aggregation functions do not work with wide d…"
This reverts commit e0f3bf8322.
2025-05-20 19:17:05 +04:00
6db2dc668f stubs and cmake formatting 2025-05-20 18:22:59 +04:00
2036e521c7 named linkage 2025-05-20 18:22:59 +04:00
a1019b7c0e chore(build): refactor main CMakeLists.txt (#3543)
* chore(build): refactor main CMakeLists.txt

* chore(build): fix boost version for packages, set clang-20 only for amd and arm

* chore(build): boost 4 sm

* chore(build): boost dep for rowgroup

* chore(build): toolset for boost

* chore(build): suppress clang warnings for boost

* chore(ci, build): use ASAN for unittest on ubuntu 24.04 only, added custom cmake flag option for bootstrap, custom params for new and existing pipelines

* chore(build): sort bootstrap flags

* chore(CI): remove publish pkg step, adding clickable link instead to publish steps, fix customenv
2025-05-20 05:00:48 +04:00
e0f3bf8322 fix(MCOL-5386): Bitwise aggregation functions do not work with wide decimals (updated previous PR) (#3522)
* fix(MCOL-5386): Bitwise aggregation functions do not work with wide decimals (updated previous PR)

* MCOL-5386: Added test for Decimal(18)
2025-05-19 20:41:28 +01:00
a0bee173f6 chore(build): fixes to satisfy clang19 warnings 2025-05-15 19:05:38 +04:00
432d0cf7f8 fix(build): replace std::ranges usage for gcc9 and std::span with boost::span 2025-04-15 17:58:11 +04:00
aa5f4fc5e7 fix(aggregation): fix dumping RGDatas to disk
`amount` parameter of `RGData::serialize` is the raw size of `rowData`
buffer without StringStore/UserDataStore/etc
2025-04-11 15:21:07 +02:00
21ebd1ac20 feat(bytestream): serialize long strings in the common way 2025-04-11 15:21:07 +02:00
4bea7e59a0 feat(PrimProc): MCOL-5852 disk-based GROUP_CONCAT & JSON_ARRAYAGG
* move GROUP_CONCAT/JSON_ARRAYAGG storage to the RowGroup from
  the RowAggregation*
* internal data structures (de)serialization
* get rid of a specialized classes for processing JSON_ARRAYAGG
* move the memory accounting to disk-based aggregation classes
* allow aggregation generations to be used for queries with
  GROUP_CONCAT/JSON_ARRAYAGG
* Remove the thread id from the error message as it interferes with the mtr
2025-04-11 15:21:07 +02:00
87d47fd7ae fix(PrimProc): MCOL-5852 use only long string storage
for the group_concat data to reduce memory usage
2025-04-11 15:21:07 +02:00
a6ab9bd615 fix(funcexp): MCOL-5386 Bitwise aggregation functions do not work with wide decimals (internal error) (#3485) 2025-04-08 16:47:47 +01:00
04b44a835e fix(rowgroup): fix for the forgotten attributes assignment 2025-03-27 22:12:48 +00:00
b649af5a0c chore(): merge cleanup 2025-03-27 22:12:48 +00:00
b14613a66b fix(aggregation): replaced instances with references 2025-03-27 22:12:48 +00:00
be5711cf0d feat(): replace getMaxDataSize with getMaxDataSizeWithStrings to accurately account for mem 2025-03-27 22:12:48 +00:00
a4c4d33ee7 feat(): zerocopy TNS case and JOIN results RGData with CountingAllocator 2025-03-27 22:12:48 +00:00
3dfc8cd454 feat(): first cleanup 2025-03-27 22:12:48 +00:00
a6de8ec1ac feat(): dangling pointer/ref issue has been solved for both RGData and BS 2025-03-27 22:12:48 +00:00
71ed9cabe0 feat(): propagate long strings SP type change 2025-03-27 22:12:48 +00:00
4e86123a5a feat(): use boost::make_shared b/c most distros can't do allocate_shared for array types. 2025-03-27 22:12:48 +00:00
5f1bd3be12 feat(RGData,StringStore): add counting allocator capabilities to those ctors used in BPP::execute() 2025-03-27 22:12:48 +00:00
0ab03c7258 chore(codestyle): mark virtual methods as override 2025-02-21 20:01:34 +04:00
60dc7550f1 fix(group by, having): MCOL-5776: GROUP BY/HAVING closer to server's (#3371)
This patch introduces an internal aggregate operator SELECT_SOME that
is automatically added to columns that are not in GROUP BY. It
"computes" some plausible value of the column (actually, last one
passed).

Along the way it fixes incorrect handling of HAVING being transferred
into WHERE, window function handling and a bit of other inconsistencies.
2024-12-20 19:11:47 +00:00
e0a01c6cf4 Reapply "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145)"
This reverts commit a5c12b98d7.
2024-12-11 12:02:24 +00:00
0a71892d97 feat(rowgroup): this returns bits lost during cherry-pick. The bits lost caused the first RGData::serialize to crash a process 2024-11-08 16:28:51 +04:00
dc03621e9d fix(rowgroup): RGData now uses uint64_t counter for the fixed sizes columns data buf.
The buffer can utilize > 4GB RAM that is necessary for PM side join.
	RGData ctor uses uint32_t allocating data buffer.
 	This fact causes implicit heap overflow.
2024-11-08 16:28:51 +04:00
a5c12b98d7 Revert "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145)"
This reverts commit c7caa4374f.
2024-07-07 13:09:56 +00:00
db4cb1d657 MCOL-4234 and MCOL 5772 cherry-picked into [stable 23.10] (#3226)
* MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194)

This patch fixes the problem in MCOL-4234 and also generally improves
behavior of GROUP BY.

It does so by introducing a "dummy" aggregate and by wrapping columns
into it. This allows for columns that are not in GROUP BY to be used
more freely, for example, in SELECT * FROM tbl GROUP BY col - all
columns that are not "col" will be wrapped into an aggregate and query
will proceed to execution.

The dummy aggregate itself does nothing more than remember last value
passed into it.

There also an additional error message that tries to explain what types
of expressions can be wrapped into an aggregate.

* MCOL-5772: incorrect ORDER BY ordering for a columns not in GROUP BY (#3214)

When ORDER BY column is not in GROUP BY, is not an aggregate and there
is a SELECT column that is also not an aggregate, there was a problem:
ordering happened on the SELECTed column, not ORDERed one.

This patch fixes that particular problem and also performs some tidying
around newly added aggregate.

---------

Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com>
2024-06-28 00:31:53 +04:00
9f4231f87f MCOL-5708 Calculate precision and scale for constant decimal. (#3227)
This patch calculates precision and scale for constant decimal
value for SUM aggregation function.
2024-06-28 00:31:03 +04:00
c7caa4374f fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145)
* fix(aggregation, disk-based): MCOL-5689 this fixes disk-based distinct aggregation functions
Previously disk-based distinct aggregation functions produced incorrect results b/c there was no finalization applied for previous generations stored on disk.

*  fix(aggregation, disk-based): Fix disk-based COUNT(DISTINCT ...) queries. (Case 2). (Distinct & Multi-Distinct, Single- & Multi-Threaded).

* fix(aggregation, disk-based): Fix disk-based DISTINCT & GROUP BY queries. (Case 1). (Distinct & Multi-Distinct, Single- & Multi-Threaded).

---------

Co-authored-by: Theresa Hradilak <theresa.hradilak@gmail.com>
Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>
2024-03-24 18:04:37 +03:00
0c6876d8e4 perf(primproc) MCOL-5601: Initilize two fields once in ctor instead of calling makeConfig
std::string fTmpDir = config::Config::makeConfig()->getTempFileDir(config::Config::TempDirPurpose::Aggregates);
std::string fCompStr = config::Config::makeConfig()->getConfig("RowAggregation", "Compression");
2023-12-21 18:19:17 +03:00
63b032e3fd fix(rowstorage): SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock (#3050) 2023-12-05 18:30:31 +03:00
86c1c5d537 fix(rgdata)!: Fix assertion failure leading to disk-based aggregation failure
The new added invariant checking that RGData knows the number of columns and fixed size columns was failing for disk-based aggregation workloads, leading them to provide a wrong result. (The assertion failure happened in RGData::getRow(uint32_t num, Row* row) which is called in the finalization of sub-aggregation results, necessary for merging part results. As the merging failed, duplicate results were output for disk-based aggregation queries.
The assertion failure was caused by RGData::deserialize(ByteStream& bs,
uint32_t defAmount) not setting rowSize and colCount if necessary (e.g.
when the deserialization happens into a new, default RGData, which
doesn't know anything about its structure yet. This is the case when the
default constructor for RGData() is used, which sets rowSize and
columnCount to 0 each.
There are three code parts that make use of the default RGData() ctor.
The fix is for the use in RowGroupStorage::loadRG(uint64_t rgid,
std::unique_ptr<RGData>& rgdata, bool unlinkDump = false), where the
default RGData object is used to directly deserialize a ByteStream into
it. The deserialize method now checks if both rowSize and columnCount
are 0 and if yes sets the read values from the ByteStream for both.
We should probably check the other two code parts making use of the
default RGData ctor, too. This happens in joinpartition.cpp and
tuplejoiner.cpp.

---------

Co-authored-by: Theresa Hradilak <34538290+phoeinx@users.noreply.github.com>
2023-09-30 00:02:31 +03:00
920607520c feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support
Adds a special column which helps to differentiate data and rollups of
various depts and a simple logic to row aggregation to add processing of
subtotals.
2023-09-26 17:01:53 +03:00
8f93fc3623 MCOL-5493: First portion of UBSan fixes (#2842)
Multiple UB fixes
2023-06-02 17:02:09 +03:00
0be1c3dc8f MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing.
1. Input and output RowGroup's used in GROUP_CONCAT classes
are currently allocating a raw memory buffer of size equal
to the actual width of the string datatype. As an example,
for the following query:
  SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1;
If col2 is a TEXT field with default width, the input
RowGroup containing the target rows to be concatenated will
assign 64kb of memory for every input row in the RowGroup.
This is wasteful as actual field values in real workloads
would be much smaller. We fix this by enabling the
RowGroup to use the StringStore when the RowGroup contains
long strings.

2. RowAggregation::initialize() allocates a memory buffer
for a NULL row. The size of this buffer is equal to the
row size for the output RowGroup. For the above scenario,
using the default group_concat_max_len (which is a server
variable that sets the maximum length of the GROUP_CONCAT string)
value of 1mb, the buffer size would be
(1mb + 64kb + some additional metadata). If the user sets
group_concat_max_len to a higher value, say 3gb, this buffer
size would be ~3gb. Now if the runtime initiates several
instances of RowAggregation, total memory consumption by
PrimProc could exceed the hardware memory limits causing the
OS OOM to kill the process. We fix this problem by again
enabling the StringStore for the NULL row allocation.

3. In the plugin code in buildAggregateColumn(), there is
an integer overflow when the server group_concat_max_len
variable (which is an uint32_t) is set to a value > INT32_MAX
(such as 3gb) and is assigned to
CalpontSystemCatalog::ColType::colWidth (which is an int32_t).
As a short term fix, we saturate the assigned value to colWidth
to INT32_MAX. Proper fix would be to upgrade
CalpontSystemCatalog::ColType::colWidth to an uint32_t.
2023-05-01 13:06:23 -04:00
4fe9cd64a3 Revert "No boost condition (#2822)" (#2828)
This reverts commit f916e64927.
2023-04-22 15:49:50 +03:00
f916e64927 No boost condition (#2822)
This patch replaces boost primitives with stdlib counterparts.
2023-04-22 00:42:45 +03:00
c2d0fa24da replace boost::shared_array<T> to std::shared_ptr<T[]> 2023-04-14 10:33:27 +00:00
a508b86091 remove boost/shared_array include 2023-04-14 09:42:50 +00:00
6c32c658d5 MCOL-5385: Delete RowGroup::setData and make Pointer ctor explicit (#2808)
* Delete RowGroup::setData and make Pointer ctor explicit

* some push_backs replaced with emplace_backs

* Fixes of review notes
2023-04-13 03:55:30 +03:00
2e1394149b MCOL-5464: Fixes of bugs from ASAN warnings, part one (#2792)
* Fixes of bugs from ASAN warnings, part one

* MQC as static library, with nifty counter for global map and mutex

* Switch clang to 16

* link messageqcpp to execplan
2023-04-04 02:33:23 +03:00