1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00
Commit Graph

1169 Commits

Author SHA1 Message Date
7c78af59a4 Try2 2024-08-07 22:16:04 +04:00
6e995e2e80 fix: MCOL-5755: incorrect handling of BLOB (and TEXT) in GROUP BY
BLOB fields did not work as grouping keys at all, they were assigned
value NULL for any value, be it NULL or not. The fix is in the
rowaggregation.cpp in the initMapping(), a switch/case branch was added
to handle BLOB field copying there.

Also, TEXT columns did not distinguish between NULL and empty string in
the grouping algorithm, now they do. The fix is in the equals()
function, now we specifically check for isNull() equality between
values.
2024-07-11 11:03:05 +03:00
57e2375dbc fix(funcexp): MCOL-4671 Fix behaviour of LEFT/RIGHT functions when negative trim length value is passedB 2024-07-04 12:51:01 +04:00
a1e64d4cb0 bug(priproc) make last_day type a bit more accurate
This fixes discrepance with the server, which assigns DATE type to
last_day()'s result.

Now we also assigns DATE result type and, also, use proper
dataconvert::Day data structure to return date.

Tests agree with InnoDB.

Also, this patch includes test for MCOL-5669, to show we fixed it.
2024-07-01 16:25:44 +03:00
2444f96b11 Merge pull request #3202 from denis0x0D/MCOL-5708
MCOL-5708 Calculate precision and scale for constant decimal.
2024-06-24 11:16:58 +03:00
1122b64cb1 MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194)
This patch fixes the problem in MCOL-4234 and also generally improves
behavior of GROUP BY.

It does so by introducing a "dummy" aggregate and by wrapping columns
into it. This allows for columns that are not in GROUP BY to be used
more freely, for example, in SELECT * FROM tbl GROUP BY col - all
columns that are not "col" will be wrapped into an aggregate and query
will proceed to execution.

The dummy aggregate itself does nothing more than remember last value
passed into it.

There also an additional error message that tries to explain what types
of expressions can be wrapped into an aggregate.
2024-06-17 20:00:54 +03:00
b1045d27b6 fix(funcexp): MCOL-5237 Proper handle DATETIME column for "ifnull" function. (#3196) 2024-06-17 12:09:14 +01:00
113d9873a3 Containers memory limits for CI (#3108)
Limit test containers by memory, fix cgroup path inside the containers by introducing new ugly setting name 

---------

Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>
Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>
2024-06-16 19:16:23 +04:00
ccb7ba5914 MCOL-5708 Calculate precision and scale for constant decimal.
This patch calculates precision and scale for constant decimal
value for SUM aggregation function.
2024-06-11 15:48:46 +00:00
a8d3fff79e chore(build) Rocky8 gcc vanilla build fix 2024-04-16 17:08:06 +03:00
1c88a7fcd8 MCOL-5597 Rollback changes introduced for DJS.
This patch changes:
1. The number of buckets created on each split.
2. The heuristic which calculates the bucket size.
2024-04-15 19:37:29 +03:00
3b7e69135d Fixes MCOL-5700, Oracle mode test results
This changeset contains fixes in Oracle mode tests and for the
implementation of the CONCAT_ORACLE. Also, we harmonise our
translation process with the recent changes in the server.

Due to changed behavior of the server, some CREATE VIEW/EXPLAIN
statements' results begun to output unexpected results and need to be
fixed.

Also, concatenation operation's name also changed. This lead to
disabled func_concat_oracle test to be enabled to test it and it
turned out that our implementation of this function was broken
and need to be fixed too.
2024-04-15 19:35:21 +03:00
af5ae35413 Revert "Fixes MCOL-5700, Oracle mode test results" 2024-03-27 18:52:30 +04:00
56b35d5cf6 Merge pull request #3156 from mariadb-corporation/sz-fix-oracle-mode
Fixes MCOL-5700, Oracle mode test results
2024-03-27 14:45:52 +06:00
34acd3559b Fixes MCOL-5700, Oracle mode test results
This changeset contains fixes in Oracle mode tests and for the
implementation of the CONCAT_ORACLE. Also, we harmonise our
translation process with the recent changes in the server.

Due to changed behavior of the server, some CREATE VIEW/EXPLAIN
statements' results begun to output unexpected results and need to be
fixed.

Also, concatenation operation's name also changed. This lead to
disabled func_concat_oracle test to be enabled to test it and it
turned out that our implementation of this function was broken
and need to be fixed too.
2024-03-27 10:00:39 +03:00
444cf4c65e fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145)
* fix(aggregation, disk-based): MCOL-5689 this fixes disk-based distinct aggregation functions
Previously disk-based distinct aggregation functions produced incorrect results b/c there was no finalization applied for previous generations stored on disk.

*  fix(aggregation, disk-based): Fix disk-based COUNT(DISTINCT ...) queries. (Case 2). (Distinct & Multi-Distinct, Single- & Multi-Threaded).

* fix(aggregation, disk-based): Fix disk-based DISTINCT & GROUP BY queries. (Case 1). (Distinct & Multi-Distinct, Single- & Multi-Threaded).

---------

Co-authored-by: Theresa Hradilak <theresa.hradilak@gmail.com>
Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>
2024-03-26 00:39:53 +03:00
5f40fb32d0 MCOL-5328: use PCRE2 and JPCRE wrapper (#3137)
PCRE2 for regexp functions in columnstore
2024-03-14 19:39:29 +04:00
83c2408f8d fix(join, threadpool): MCOL-5565: MCOL-5636: MCOL-5645: port from develop-23.02 to [develop] (#3128)
* fix(threadpool): MCOL-5565 queries stuck in FairThreadScheduler. (#3100)

Meta Primitive Jobs, .e.g ADD_JOINER, LAST_JOINER stuck
	in Fair scheduler without out-of-band scheduler. Add OOB
	scheduler back to remedy the issue.

* fix(messageqcpp): MCOL-5636 same node communication crashes transmiting PP errors to EM b/c error messaging leveraged socket that was a nullptr. (#3106)

* fix(threadpool): MCOL-5645 errenous threadpool Job ctor implictly sets socket shared_ptr to nullptr causing sigabrt when threadpool returns an error (#3125)

---------

Co-authored-by: drrtuy <roman.nozdrin@mariadb.com>
2024-02-13 19:01:16 +03:00
ebcf43a517 fix(join, disk-based): MCOL-5597: large side read errors (#3117)
The large side read errors mentioned there can be due to failure to
close file stream properly. Some of the data may still reside in the
file stream buffers, closing must flush it. The flush is an I/O
operation and can fail, leading to partial write and subsequent partial
read.

This patch tries to provide better diagnostics.
2024-02-09 22:25:43 +03:00
0d1c72a563 compilation fix for gcc12 on known gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105329 2024-01-04 11:43:03 +03:00
4d7a6a0be5 perf(primproc) MCOL-5601: Initilize two fields once in ctor instead of calling makeConfig
std::string fTmpDir = config::Config::makeConfig()->getTempFileDir(config::Config::TempDirPurpose::Aggregates);
std::string fCompStr = config::Config::makeConfig()->getConfig("RowAggregation", "Compression");
2023-12-19 15:25:19 +03:00
9d5ad925eb fix(linkage) link libm to libmarias3 2023-12-18 14:10:14 +03:00
1f958c9ed2 MCOL-5625: Fixes json_query implementation
Also extends func_json_value.test.
2023-12-12 15:45:03 +03:00
865cca11c9 MCOL-5505 Add TypeHandler functions. 2023-11-30 01:47:13 +04:00
fe597ec78c MCOL-5505 add parquet support for cpimport and add mcs_parquet_ddl and mcs_parquet_gen tools 2023-11-30 01:47:13 +04:00
792aea2a7c Fixes MCOL-5599 where LIKE operator never finishes
This is a fix of logging subsystem, nothing else.

The old code expanded an argument into string and advanced too little
and, if expansion contained argument's index, it expanded it again. And
again.
2023-11-29 19:17:16 +04:00
8632c85ecf feat(primproc,aggregegation)!: Changes for ROLLUP with single-phase aggregation (#3025)
The fix is simple: enable subtotals in single-phase aggregation and
disable parallel processing when there are subtotals and aggregation is
single-phase.
2023-11-28 17:33:02 +03:00
76e4e13b80 fix(rowgroup,stringstore): MCOL-5597 Set length for nullptr string to 0. (#3027) 2023-11-28 17:18:52 +03:00
5c9770d1e6 fix(funcexp): MCOL-5607: JSON function use crashes query execution (#3028)
JSON functions were implemented violating an assumption of their
pureness, as they should not have any state. This concrete patch
fixes implementation of JSON_VALUE function.
2023-11-21 23:46:03 +03:00
69b8e1c779 feat(extent-elimination)!: re-enable extent-elimination for dictionary columns scanning
This is "productization" of an old code that would enable extent
elimination for dictionary columns.

This concrete patch enables it, fixes perfomance degradation (main
problem with old code) and also fixes incorrect behavior of cpimport.
2023-11-17 17:14:35 +03:00
67c842e792 Merge pull request #3017 from drrtuy/fix/MCOL-5472-urandom-mutex
fix(rowstorage): MCOL-5472 SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock
2023-11-02 16:10:57 +02:00
dfc9e89496 fix(rowstorage): SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock 2023-11-01 18:19:45 +00:00
f7045457f2 chore(datatypes): refactoring math ops results domain check functionality 2023-10-25 09:12:54 +00:00
eb744eafed chore(datatypes): this refactors the placement of the main SQL data types enum to enable templates that are parametrized with this enum(see mcs_datatype_basic.h changes for more details). 2023-10-24 18:44:35 +03:00
1f71847e99 fix(packaging) dh_missing: warning are treated as errors for buildbot debians
dh_missing: warning: Compatibility levels before 10 are deprecated (level 9 in use)
dh_missing: warning: usr/lib/x86_64-linux-gnu/libmessageqcpp.a exists in debian/tmp but is not installed to anywhere
dh_missing: warning: usr/lib/x86_64-linux-gnu/libpron.a exists in debian/tmp but is not installed to anywhere

so do not install static libraries as targets on CMake
2023-10-04 13:20:24 -04:00
86c1c5d537 fix(rgdata)!: Fix assertion failure leading to disk-based aggregation failure
The new added invariant checking that RGData knows the number of columns and fixed size columns was failing for disk-based aggregation workloads, leading them to provide a wrong result. (The assertion failure happened in RGData::getRow(uint32_t num, Row* row) which is called in the finalization of sub-aggregation results, necessary for merging part results. As the merging failed, duplicate results were output for disk-based aggregation queries.
The assertion failure was caused by RGData::deserialize(ByteStream& bs,
uint32_t defAmount) not setting rowSize and colCount if necessary (e.g.
when the deserialization happens into a new, default RGData, which
doesn't know anything about its structure yet. This is the case when the
default constructor for RGData() is used, which sets rowSize and
columnCount to 0 each.
There are three code parts that make use of the default RGData() ctor.
The fix is for the use in RowGroupStorage::loadRG(uint64_t rgid,
std::unique_ptr<RGData>& rgdata, bool unlinkDump = false), where the
default RGData object is used to directly deserialize a ByteStream into
it. The deserialize method now checks if both rowSize and columnCount
are 0 and if yes sets the read values from the ByteStream for both.
We should probably check the other two code parts making use of the
default RGData ctor, too. This happens in joinpartition.cpp and
tuplejoiner.cpp.

---------

Co-authored-by: Theresa Hradilak <34538290+phoeinx@users.noreply.github.com>
2023-09-30 00:02:31 +03:00
920607520c feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support
Adds a special column which helps to differentiate data and rollups of
various depts and a simple logic to row aggregation to add processing of
subtotals.
2023-09-26 17:01:53 +03:00
fd94ab5042 chore(logging): move cgroup /cgroup version log from constructor to getTotalMemory to avoid duplicate log as constructor is called per query 2023-09-25 22:17:09 +03:00
7f9c624626 MCOL-5573 Fix cpimport truncation of TEXT columns.
1. Restore the utf8_truncate_point() function in utils/common/utils_utf8.h
that I removed as part of the patch for MCOL-4931.

2. As per the definition of TEXT columns, the default column width represents
the maximum number of bytes that can be stored in the TEXT column. So the
effective maximum length is less if the value contains multi-byte characters.
However, if the user explicitly specifies the length of the TEXT column in a
table DDL, such as TEXT(65535), then the DDL logic ensures that enough number
of bytes are allocated (upto a system maximum) to allow upto that many number
of characters (multi-byte characters if the charset for the column is multi-byte,
such as utf8mb3).
2023-09-20 12:23:22 -04:00
931f2b36a1 MCOL-4931 Make cpimport charset-aware. (#2938)
1. Extend the following CalpontSystemCatalog member functions to
   set CalpontSystemCatalog::ColType::charsetNumber, after the
   system catalog update to add charset number to calpontsys.syscolumn
   in MCOL-5005:
     CalpontSystemCatalog::lookupOID
     CalpontSystemCatalog::colType
     CalpontSystemCatalog::columnRIDs
     CalpontSystemCatalog::getSchemaInfo

2. Update cpimport to use the CHARSET_INFO object associated with the
   charset number retrieved from the system catalog, for a
   dictionary/non-dictionary CHAR/VARCHAR/TEXT column, to truncate
   long strings that exceed the target column character length.

3. Add MTR test cases.
2023-09-05 17:17:20 +03:00
5b4f06bf0d Logging of memory (#2930)
* -logging of memory WIP

* -better log for cgroup case

* -fix log

* -display in GIB

* add log for freememory for non CGROUP
(to be discussed)

* test repeated log entries

* -added counter for every 1000 call. effectivly 15m

* Name logginng period and inrease it, clear config files from PR, add .gitignore

---------

Co-authored-by: pgmabv99 <alexey.vorovich@gmail.com>
Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>
2023-09-05 15:46:29 +03:00
765dd46b61 fix(pp-threadpool): the workaround for a stuck tests001 in CI (#2931)
CI ocassionaly stuck running test001 b/c PP threadpool endlessly reschedules
    meta jobs, e.g. BATCH_PRIMITIVE_CREATE, which ByteStreams were somehow damaged or read out.

Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>
2023-08-18 00:02:31 +03:00
48562e41f9 feat(datatypes): MCOL-4632 and MCOL-4648, fix cast leads to NULL.
Remove redundant cast.

As C-style casts with a type name in parantheses are interpreted as static_casts this literally just changes the interpretation around (and forces an implicit cast to match the return value of the function).

Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency.

Make consistent with relation between BIGINTNULL and BIGINTEMPTYROW & make adapted cast behaviour due to NULL markers more intuitive. (After this change we can simply block the highest possible uint64_t value and if a cast results in it, print the next lower value (2^64 - 2). Previously, (2^64 - 1) was able to be printed, but (2^64 - 2) as being blocked by the UBIGINTNULL constant was not, making finding the appropiate replacement value to give out more confusing.

Introduce MAX_MCS_UBIGINT and MIN_MCS_BIGINT and adapt casts.

Adapt casting to BIGINT to remove NULL marker error.

Add bugfix regression test for MCOL 4632

Add regression test for mcol_4648

Revert "Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency."

This reverts commit 83eac11b18937ecb0b4c754dd48e4cb47310f620.
Due to backwards compatability issues.

Refactor casting to MCS[U]Int to datatype functions.

Update regression tests to include other affected datatypes.

Apply formatting.

Refactor according to PR review

Remove redundant new constant, switch to using already existing constant.

Adapt nullstring casting to EMPTYROW markers for backwards compatability.

Adapt tests for backward compatability behaviour allowing text datatypes to be casted to EMPTYROW constant.

Adapt mcol641-functions test according to bug fix.

Update tests according to new expected behaviour.

Adapt tests to new understanding of issue.

Update comments/documentation for MCOL_4632 test.

Adapt to new cast limit logic.

Make bracketing consistent.

Adapt previous regression test to new expected behaviour.
2023-08-11 13:00:30 +00:00
1a49a09af3 Merge pull request #2878 from denis0x0D/MCOL-5514_dev_1
MCOL-5514 Parallel disk join step
2023-07-25 14:32:13 +01:00
65cde8c894 feature: pron (#2908)
* feature: Special dictionary, we can pass with session veriable to modify codepaths and behaviour for testing and debugging
2023-07-21 14:02:03 +03:00
8d06822be5 atomic stop flag 2023-07-12 18:17:13 +03:00
bab29ff495 Simpler Config 2023-07-12 18:15:26 +03:00
2a66ae2ed1 MCOL-5514 Parallel disk join step. 2023-07-11 14:05:14 +03:00
ebfb9face2 compiler failures with gcc 12.x
a workaround for something that looks like a bug in a compiler.
Fixes errors like

In file included from /usr/include/c++/12/string:40,
                 from /mnt/server/storage/columnstore/columnstore/utils/funcexp/func_math.cpp:26:
In static member function ‘static constexpr std::char_traits<char>::char_type* std::char_traits<char>::copy(char_type*, const char_type*, std::size_t)’,
    inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:423:21,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.tcc:532:22,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:2171:19,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::insert(size_type, const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:1928:22,
    inlined from ‘virtual std::string funcexp::Func_format::getStrVal(rowgroup::Row&, funcexp::FunctionParm&, bool&, execplan::CalpontSystemCatalog::ColType&)’ at /mnt/server/storage/columnstore/columnstore/utils/funcexp/func_math.cpp:2008:17:
/usr/include/c++/12/bits/char_traits.h:431:56: error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets 3 and [2, 2147483645] may overlap up to 9223372036854775813 bytes at offset -3 [-Werror=restrict]
  431 |         return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n );

$ gcc --version
gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0
2023-07-04 12:58:18 -04:00
6d44d2e850 MCOL-5500 Remove another noisy printout. (#2886)
Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>
2023-06-29 15:46:00 +03:00