1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-30 19:23:07 +03:00
Commit Graph

331 Commits

Author SHA1 Message Date
449029a827 Deep build refactoring phase 2 (#3564)
* configcpp refactored

* chore(build): massive removals, auto add files to debian install file

* chore(build): configure before autobake

* chore(build): use custom cmake commands for components, mariadb-plugin-columnstore.install generated

* chore(build): install deps as separate step for build-packages

* more deps

* chore(codemanagement, build): build refactoring stage2

* chore(safety): Locked Map for MessageqCpp with a simpler way

 Please enter the commit message for your changes. Lines starting

* chore(codemanagement, ci): better coredumps handling, deps fixed

* Delete build/bootstrap_mcs.py

* Update charset.cpp (add license)
2025-07-17 16:14:10 +04:00
a501ef8721 fix string args 2025-07-04 19:57:49 +04:00
d031239844 add args to lost connection error when getting system data 2025-07-04 19:57:49 +04:00
40f4ee7008 add args to system catalog err 2025-07-04 19:57:49 +04:00
8a2ae35918 chore(): review fixes 2025-06-26 18:35:33 +01:00
464b9a1ca3 chore(review): clean up leftovers 2025-06-26 18:35:33 +01:00
327231276d chore(): remove unused standalone unit test 2025-06-26 18:35:33 +01:00
e57832ee64 feat(optimizer): temporary shield optimizer with a session variable 2025-06-26 18:35:33 +01:00
98cb6dddee feat(optimizer): replace simple walk with iterative DFS with convergence 2025-06-26 18:35:33 +01:00
1baaf878d0 feat(optimizer): basic rewrite Union unit into Sub with union 2025-06-26 18:35:33 +01:00
e8dc93b46d feat(optimizer): better CSEP printer + shallow CSEP copy 2025-06-26 18:35:33 +01:00
e73e5834ab feat(optimizer): first cut for rewrite foreign table into UNION rule 2025-06-26 18:35:33 +01:00
79008f4f69 feat(CSEP): CSEP printer with indentations to simplify reading + rewriter skeleton + some test binary to describe minimalistic CSEP localy 2025-06-26 18:35:33 +01:00
aa7e0fb9b4 Deep build refactoring phase 1 (#3562)
* configcpp refactored
* logging and datatypes refactored

* more dataconvert
* chore(build): massive removals, auto add files to debian install file
* chore(codemanagement): nodeps headers, potentioal library
* chore(build): configure before autobake
* chore(build): use custom cmake commands for components, mariadb-plugin-columnstore.install generated
* chore(build): install deps as separate step for build-packages
* more deps
* check  debian/mariadb-plugin-columnstore.install automatically
* chore(build): add option for multibracnh compilation
* Fix warning
2025-05-30 14:05:21 +04:00
5814a80b50 MCOL-4671: MCOL-4622: fix the behavior of both PRs
first was playing different with RIGHT and LEFT functions(using the getUintVal and getIntVal accordingly)
https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/3234
second introduced round for ints from double, but added it to uint but not to int missing long doubles as well
https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/3480
2025-05-23 05:12:17 +04:00
3bb2496ca1 fix: MCOL-5755: incorrect handling of BLOB (and TEXT) in GROUP BY
BLOB fields did not work as grouping keys at all, they were assigned
value NULL for any value, be it NULL or not. The fix is in the
rowaggregation.cpp in the initMapping(), a switch/case branch was added
to handle BLOB field copying there.

Also, TEXT columns did not distinguish between NULL and empty string in
the grouping algorithm, now they do. The fix is in the equals()
function, now we specifically check for isNull() equality between
values.
2025-05-23 05:12:17 +04:00
b65a5a1ef9 chore(build): turn off WError for ASAN builds as gcc STL has internal warnings last 3 years: https://gcc.gnu.org/bugzilla/show_bug.cgi\?id\=105562 2025-05-21 21:59:08 +04:00
6db2dc668f stubs and cmake formatting 2025-05-20 18:22:59 +04:00
2036e521c7 named linkage 2025-05-20 18:22:59 +04:00
a0bee173f6 chore(build): fixes to satisfy clang19 warnings 2025-05-15 19:05:38 +04:00
8859e3f4df chore(build): satisfy gcc9 for execplan partionions unequivalence 2025-04-25 17:36:43 +04:00
bd1622f331 feat(MCOL-5886): support InnoDB's table partitions in cross-engine joins
The purpose of this changeset is to obtain list of partitions from
SELECT_LEX structure and pass it down to joblist and then to
CrossEngineStep to pass to InnoDB.
2025-04-23 08:24:10 +03:00
4bea7e59a0 feat(PrimProc): MCOL-5852 disk-based GROUP_CONCAT & JSON_ARRAYAGG
* move GROUP_CONCAT/JSON_ARRAYAGG storage to the RowGroup from
  the RowAggregation*
* internal data structures (de)serialization
* get rid of a specialized classes for processing JSON_ARRAYAGG
* move the memory accounting to disk-based aggregation classes
* allow aggregation generations to be used for queries with
  GROUP_CONCAT/JSON_ARRAYAGG
* Remove the thread id from the error message as it interferes with the mtr
2025-04-11 15:21:07 +02:00
b8c0b74f2b fix(funexp): MCOL-4622 Implicit FLOAT->INT and DOUBLE->INT conversion is not like in InnoDB (#3480) 2025-04-04 21:28:16 +01:00
0ab03c7258 chore(codestyle): mark virtual methods as override 2025-02-21 20:01:34 +04:00
60dc7550f1 fix(group by, having): MCOL-5776: GROUP BY/HAVING closer to server's (#3371)
This patch introduces an internal aggregate operator SELECT_SOME that
is automatically added to columns that are not in GROUP BY. It
"computes" some plausible value of the column (actually, last one
passed).

Along the way it fixes incorrect handling of HAVING being transferred
into WHERE, window function handling and a bit of other inconsistencies.
2024-12-20 19:11:47 +00:00
5e5d328269 Fix build 2024-12-10 20:30:58 +04:00
50a31d1296 Fix build 2024-12-10 20:30:58 +04:00
6b2334cecf Fix build 2024-12-10 20:30:58 +04:00
39a976c39a fix(ubsan): MCOL-5844 - iron out UBSAN reports
The most important fix here is the fix of possible buffer overrun in
DATEFORMAT() function. A "%W" format, repeated enough times, would
overflow the 256-bytes buffer for result. Now we use ostringstream to
construct result and we are safe.

Changes in date/time projection functions made me fix difference between
us and server behavior. The new, better behavior is reflected in changes
in tests' results.

Also, there was incorrect logic in TRUNCATE() and ROUND() functions in
computing the decimal "shift."
2024-12-10 20:30:58 +04:00
539db054b3 MCOL-5779: use encoding to check alter table alter column statement correctly 2024-08-30 17:54:56 +04:00
4b411b3968 MCOL-4696: get rid of boost::iequals 2024-08-21 21:35:52 +04:00
db4cb1d657 MCOL-4234 and MCOL 5772 cherry-picked into [stable 23.10] (#3226)
* MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194)

This patch fixes the problem in MCOL-4234 and also generally improves
behavior of GROUP BY.

It does so by introducing a "dummy" aggregate and by wrapping columns
into it. This allows for columns that are not in GROUP BY to be used
more freely, for example, in SELECT * FROM tbl GROUP BY col - all
columns that are not "col" will be wrapped into an aggregate and query
will proceed to execution.

The dummy aggregate itself does nothing more than remember last value
passed into it.

There also an additional error message that tries to explain what types
of expressions can be wrapped into an aggregate.

* MCOL-5772: incorrect ORDER BY ordering for a columns not in GROUP BY (#3214)

When ORDER BY column is not in GROUP BY, is not an aggregate and there
is a SELECT column that is also not an aggregate, there was a problem:
ordering happened on the SELECTed column, not ORDERed one.

This patch fixes that particular problem and also performs some tidying
around newly added aggregate.

---------

Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com>
2024-06-28 00:31:53 +04:00
e3c5e10207 fix(datatypes, funcexp): static_cast typo fix (#3001) 2023-10-25 20:14:38 +03:00
b826fc1fd6 fix(datatypes, funcexp): Overflow detection for MCOL-5568 use case (and some other) (#2987)
We add intermediate calculations in int128_t when target is UBIGINT and
check for overflow before converting into the UBIGINT. This is so
because we can overflow on addition and multiplication, with (some)
signed operands or both unsigned.
2023-10-25 20:14:38 +03:00
320df831c6 MCOL-5572 Force the charset on the autoincrement column of (#2976)
calpontsys.syscolumn syscat table to be latin1.

This change is done in one of the ctors of pColStep which is
initiated while building the job list from the execution plan.
2023-09-28 22:03:39 +03:00
920607520c feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support
Adds a special column which helps to differentiate data and rollups of
various depts and a simple logic to row aggregation to add processing of
subtotals.
2023-09-26 17:01:53 +03:00
4bfce51628 Fix autoincrement filtering problems with utf-8 (#2964)
MCOL-5572: Widen the autoincrement column to accomodate utf-8  encoded into weights with strnxfrm function.
2023-09-22 16:40:10 +03:00
1c9cd9db9f Fix garbage charset using ColType(int32_t colWidth_, int32_t scale_, int32_t precision_, (#2949)
const ConstraintType& constraintType_, const DictOID& ddn_, int32_t colPosition_,
            int32_t compressionType_, OID columnOID_, const ColDataType& colDataType_);
2023-09-06 20:01:31 +03:00
931f2b36a1 MCOL-4931 Make cpimport charset-aware. (#2938)
1. Extend the following CalpontSystemCatalog member functions to
   set CalpontSystemCatalog::ColType::charsetNumber, after the
   system catalog update to add charset number to calpontsys.syscolumn
   in MCOL-5005:
     CalpontSystemCatalog::lookupOID
     CalpontSystemCatalog::colType
     CalpontSystemCatalog::columnRIDs
     CalpontSystemCatalog::getSchemaInfo

2. Update cpimport to use the CHARSET_INFO object associated with the
   charset number retrieved from the system catalog, for a
   dictionary/non-dictionary CHAR/VARCHAR/TEXT column, to truncate
   long strings that exceed the target column character length.

3. Add MTR test cases.
2023-09-05 17:17:20 +03:00
05547f2342 Add a limit (as runtime value) for long in queries 2023-08-21 10:38:46 +03:00
6ff121a91c Replace recursion with iteration in ParseTree (and some related walkers) 2023-08-21 10:36:41 +03:00
f55d41c079 Merge pull request #2912 from tntnatbry/MCOL-5005
MCOL-5005 Add charset number to system catalog.
2023-08-15 22:22:21 +02:00
d50a0fa2e6 MCOL-5005 Add charset number to system catalog - Part 2.
1. Extend the calpontsys.syscolumn system catalog table
  with a new column, 'charsetnum'.

  'charsetnum' field is set to the 'number' member of the
  'charset_info_st' struct defined in the server in m_ctype.h.

  For CHAR/VARCHAR/TEXT column types, 'charset_info_st' is
  initialized to the charset/collation of the column, which
  is set at the column-level or at the table-level in the DDL.

  For BLOB/VARBINARY binary column types, 'charset_info_st' is
  initialized to my_charset_bin (charsetnum=63).

  For all other column types, charsetnum is set to 0.

  2. Add support for the newly added 'charsetnum' column in the
  automatic system catalog upgrade logic in dbbuilder.

  For existing table definitions, charsetnum for the column is
  defaulted to 0.

  3. Add MTR test case that creates a few table definitions with
  a range of charset/collation combinations and queries the
  calpontsys.syscolumn system catalog table with the charsetnum
  field for the columns in the table DDLs.
2023-08-15 17:21:47 +00:00
896e8dd769 MCOL-5522 Properly process pm join result count. (#2909)
This patch:
1. Properly processes situation when pm join result count is exceeded.
2. Adds session variable 'columnstore_max_pm_join_result_count` to control the limit.
2023-08-04 16:55:45 +03:00
65cde8c894 feature: pron (#2908)
* feature: Special dictionary, we can pass with session veriable to modify codepaths and behaviour for testing and debugging
2023-07-21 14:02:03 +03:00
ebfb9face2 compiler failures with gcc 12.x
a workaround for something that looks like a bug in a compiler.
Fixes errors like

In file included from /usr/include/c++/12/string:40,
                 from /mnt/server/storage/columnstore/columnstore/utils/funcexp/func_math.cpp:26:
In static member function ‘static constexpr std::char_traits<char>::char_type* std::char_traits<char>::copy(char_type*, const char_type*, std::size_t)’,
    inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:423:21,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.tcc:532:22,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:2171:19,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::insert(size_type, const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:1928:22,
    inlined from ‘virtual std::string funcexp::Func_format::getStrVal(rowgroup::Row&, funcexp::FunctionParm&, bool&, execplan::CalpontSystemCatalog::ColType&)’ at /mnt/server/storage/columnstore/columnstore/utils/funcexp/func_math.cpp:2008:17:
/usr/include/c++/12/bits/char_traits.h:431:56: error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets 3 and [2, 2147483645] may overlap up to 9223372036854775813 bytes at offset -3 [-Werror=restrict]
  431 |         return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n );

$ gcc --version
gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0
2023-07-04 12:58:18 -04:00
501da394ca Replace std::set contains method with count
to support Rocky/RHEL/Alma 8 where the std::set
in the stock STL does not have contains method
2023-07-04 12:58:18 -04:00
1f190a6e75 MCOL-5477 Disk join step improvement.
This patch:
1. Handles corner case when the bucket exceeded the memory limit, but we cannot redistribute the data in this bucket into new buckets based on a hash algorithm, because the rows have the same values.
2. Adds force option for disk join step.
3. Add a option to contol the depth of the partition tree.
2023-06-23 18:40:15 +03:00
8f93fc3623 MCOL-5493: First portion of UBSan fixes (#2842)
Multiple UB fixes
2023-06-02 17:02:09 +03:00