1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-08-01 06:46:55 +03:00
Commit Graph

1647 Commits

Author SHA1 Message Date
add3a57e8d MCOL-5539 Put table on small side if it was involved in prev.join. (#2945) 2023-09-05 12:19:43 +03:00
d586975da7 Rename a limit var + change error message (#2946)
* Rename a limit var + change error message

* Adjust the test
2023-09-05 12:19:15 +03:00
05547f2342 Add a limit (as runtime value) for long in queries 2023-08-21 10:38:46 +03:00
6ff121a91c Replace recursion with iteration in ParseTree (and some related walkers) 2023-08-21 10:36:41 +03:00
f55d41c079 Merge pull request #2912 from tntnatbry/MCOL-5005
MCOL-5005 Add charset number to system catalog.
2023-08-15 22:22:21 +02:00
d50a0fa2e6 MCOL-5005 Add charset number to system catalog - Part 2.
1. Extend the calpontsys.syscolumn system catalog table
  with a new column, 'charsetnum'.

  'charsetnum' field is set to the 'number' member of the
  'charset_info_st' struct defined in the server in m_ctype.h.

  For CHAR/VARCHAR/TEXT column types, 'charset_info_st' is
  initialized to the charset/collation of the column, which
  is set at the column-level or at the table-level in the DDL.

  For BLOB/VARBINARY binary column types, 'charset_info_st' is
  initialized to my_charset_bin (charsetnum=63).

  For all other column types, charsetnum is set to 0.

  2. Add support for the newly added 'charsetnum' column in the
  automatic system catalog upgrade logic in dbbuilder.

  For existing table definitions, charsetnum for the column is
  defaulted to 0.

  3. Add MTR test case that creates a few table definitions with
  a range of charset/collation combinations and queries the
  calpontsys.syscolumn system catalog table with the charsetnum
  field for the columns in the table DDLs.
2023-08-15 17:21:47 +00:00
64f1d541d0 MCOL-5519: new defaults in columnstore.cnf (#2894)
feat(charset)!: utf8 is a new charset default and utf8_general_ci is a new collation default in the engine configuration file shipped
---------

Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>
Co-authored-by: mariadb-DanielLee <daniel.lee@mariadb.com>
2023-08-15 18:04:32 +03:00
712d34a407 MCOL-4988 Table lock remained after DML failure due to DBRM in read-only mode.
DMLProcessor functor earlier did not check if the DBRM was in read-only mode.
This allowed DML statements to continue execution to the point where it locks
the table and then sends the statement down to the WriteEngineServer, which
ultimately returns back in an error state to DMLProc when it fails to perform
BRM updates due to DBRM in read-only mode. This caused a lingering table lock
in the system which could only be cleared on a system restart.

As a fix, we add a check in the DMLProcessor functor to detect if DBRM is in
read only mode, and if so, return back early in the execution of the DML
statement.
2023-08-15 10:25:27 -04:00
48562e41f9 feat(datatypes): MCOL-4632 and MCOL-4648, fix cast leads to NULL.
Remove redundant cast.

As C-style casts with a type name in parantheses are interpreted as static_casts this literally just changes the interpretation around (and forces an implicit cast to match the return value of the function).

Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency.

Make consistent with relation between BIGINTNULL and BIGINTEMPTYROW & make adapted cast behaviour due to NULL markers more intuitive. (After this change we can simply block the highest possible uint64_t value and if a cast results in it, print the next lower value (2^64 - 2). Previously, (2^64 - 1) was able to be printed, but (2^64 - 2) as being blocked by the UBIGINTNULL constant was not, making finding the appropiate replacement value to give out more confusing.

Introduce MAX_MCS_UBIGINT and MIN_MCS_BIGINT and adapt casts.

Adapt casting to BIGINT to remove NULL marker error.

Add bugfix regression test for MCOL 4632

Add regression test for mcol_4648

Revert "Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency."

This reverts commit 83eac11b18937ecb0b4c754dd48e4cb47310f620.
Due to backwards compatability issues.

Refactor casting to MCS[U]Int to datatype functions.

Update regression tests to include other affected datatypes.

Apply formatting.

Refactor according to PR review

Remove redundant new constant, switch to using already existing constant.

Adapt nullstring casting to EMPTYROW markers for backwards compatability.

Adapt tests for backward compatability behaviour allowing text datatypes to be casted to EMPTYROW constant.

Adapt mcol641-functions test according to bug fix.

Update tests according to new expected behaviour.

Adapt tests to new understanding of issue.

Update comments/documentation for MCOL_4632 test.

Adapt to new cast limit logic.

Make bracketing consistent.

Adapt previous regression test to new expected behaviour.
2023-08-11 13:00:30 +00:00
896e8dd769 MCOL-5522 Properly process pm join result count. (#2909)
This patch:
1. Properly processes situation when pm join result count is exceeded.
2. Adds session variable 'columnstore_max_pm_join_result_count` to control the limit.
2023-08-04 16:55:45 +03:00
4f580d109d Fix a compiler error related to signed v/s unsigned integer comparison. (#2915) 2023-08-04 16:54:40 +03:00
a36ea6dbb4 MCOL-5005 Add charset number to system catalog - Part 1.
This patch improves/fixes the existing handling of CHARSET and
COLLATION symbols in the ColumnStore DDL parser.

Also, add fCollate and fCharsetNum member variables to the
ddlpackage::ColumnType class.
2023-07-28 18:36:53 -04:00
1a49a09af3 Merge pull request #2878 from denis0x0D/MCOL-5514_dev_1
MCOL-5514 Parallel disk join step
2023-07-25 14:32:13 +01:00
65cde8c894 feature: pron (#2908)
* feature: Special dictionary, we can pass with session veriable to modify codepaths and behaviour for testing and debugging
2023-07-21 14:02:03 +03:00
2a66ae2ed1 MCOL-5514 Parallel disk join step. 2023-07-11 14:05:14 +03:00
ebfb9face2 compiler failures with gcc 12.x
a workaround for something that looks like a bug in a compiler.
Fixes errors like

In file included from /usr/include/c++/12/string:40,
                 from /mnt/server/storage/columnstore/columnstore/utils/funcexp/func_math.cpp:26:
In static member function ‘static constexpr std::char_traits<char>::char_type* std::char_traits<char>::copy(char_type*, const char_type*, std::size_t)’,
    inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:423:21,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.tcc:532:22,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::replace(size_type, size_type, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:2171:19,
    inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::insert(size_type, const _CharT*) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:1928:22,
    inlined from ‘virtual std::string funcexp::Func_format::getStrVal(rowgroup::Row&, funcexp::FunctionParm&, bool&, execplan::CalpontSystemCatalog::ColType&)’ at /mnt/server/storage/columnstore/columnstore/utils/funcexp/func_math.cpp:2008:17:
/usr/include/c++/12/bits/char_traits.h:431:56: error: ‘void* __builtin_memcpy(void*, const void*, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets 3 and [2, 2147483645] may overlap up to 9223372036854775813 bytes at offset -3 [-Werror=restrict]
  431 |         return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n );

$ gcc --version
gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0
2023-07-04 12:58:18 -04:00
501da394ca Replace std::set contains method with count
to support Rocky/RHEL/Alma 8 where the std::set
in the stock STL does not have contains method
2023-07-04 12:58:18 -04:00
a8be4a3787 compiler warnings
like

dbcon/joblist/batchprimitiveprocessor-jl.cpp:893:54: error: pointer used after ‘void operator delete [](void*, std::size_t)’ [-Werror=use-after-free]
  893 |           joinResults.reset(new vector<uint32_t>[8192]);
      |                                                      ^
2023-07-04 12:58:18 -04:00
2aba28d855 Merge pull request #2851 from denis0x0D/MCOL-5477
MCOL-5477 Disk join step improvement.
2023-06-26 11:02:20 +03:00
1f190a6e75 MCOL-5477 Disk join step improvement.
This patch:
1. Handles corner case when the bucket exceeded the memory limit, but we cannot redistribute the data in this bucket into new buckets based on a hash algorithm, because the rows have the same values.
2. Adds force option for disk join step.
3. Add a option to contol the depth of the partition tree.
2023-06-23 18:40:15 +03:00
024e6bd358 MCOL-5512 Fix for post join filter.
This patch fixes certain situations where post join filter is not applying.
2023-06-09 11:15:05 +03:00
62dc392476 MCOL-5499 Enable ControlFlow for same node communication processing path to avoid DEC queue overloading (#2848) 2023-06-07 15:41:59 +03:00
8f93fc3623 MCOL-5493: First portion of UBSan fixes (#2842)
Multiple UB fixes
2023-06-02 17:02:09 +03:00
c598a9bbed MCOL-5480 LOAD DATA INFILE incorrectly loads values for MEDIUMINT datatype.
Internal memory representation of MEDIUMINT datatype uses 24 bits. This is
true for both MariaDB server as well as ColumnStore. MCS plugin code uses
TypeHandlerSInt24 and TypeHandlerUInt24 classes to respectively convert the
binary representation of the signed and unsigned MEDIUMINT values passed by
the server to the plugin. The plugin then outputs the text representation
of these values into an open file descriptor which is piped to cpimport
for the final load into the MCS db files.

The TypeHandlerXInt24 classes were earlier incorrectly using
WriteBatchField::ColWriteBatchXInt32() functions which operate on a 4 byte
buffer. This resulted in incorrect parsing of MEDIUMINT values. As a fix,
we implement WriteBatchField::ColWriteBatchXInt24() functions which
correctly handle the 24 bit input buffer used for MEDIUMINT datatype.
2023-05-23 16:00:05 -04:00
de7ba854bd Merge pull request #2840 from tntnatbry/MCOL-5491
MCOL-5491 Enable StringStore for long strings in JSON_ARRAYAGG processing.
2023-05-17 12:12:03 +01:00
87eb875379 MCOL-5491 Enable StringStore for long strings in JSON_ARRAYAGG processing.
This patch is the JSON_ARRAYAGG clone of the changes done in MCOL-5429
where we enabled usage of StringStore for long strings in
GROUP_CONCAT() processing to reduce memory footprint of PrimProc and
thus avoiding a potential OS triggered OOM crash.
2023-05-12 19:45:02 +00:00
1477b28ee9 MCOL-5357 Fix TPC-DS query error "MCS-3009: Unknown column '.<colname>'".
For the following query:

select item from (
select item from (select a as item from t1) tt
union all
select item from (select a as item from t1) tt
) ttt;

There is an if predicate in buildSimpleColFromDerivedTable() that compares
the outermost query field name (ttt.item) to the returned column list of
the inner query (tt.item) when building the returned column list of the
outer most query. In the above query example, the inner query field name
is an alias set in the inner most query and is set to "`tt`.`item`",
while the outermost query field name is set to "item". The use of
backticks "`" in the inner query alias is causing the execution to
not enter the if block which creates the SimpleColumn for the outermost
query field name. As a fix, we strip off the backticks from the inner
query alias.
2023-05-03 16:06:20 +00:00
0be1c3dc8f MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing.
1. Input and output RowGroup's used in GROUP_CONCAT classes
are currently allocating a raw memory buffer of size equal
to the actual width of the string datatype. As an example,
for the following query:
  SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1;
If col2 is a TEXT field with default width, the input
RowGroup containing the target rows to be concatenated will
assign 64kb of memory for every input row in the RowGroup.
This is wasteful as actual field values in real workloads
would be much smaller. We fix this by enabling the
RowGroup to use the StringStore when the RowGroup contains
long strings.

2. RowAggregation::initialize() allocates a memory buffer
for a NULL row. The size of this buffer is equal to the
row size for the output RowGroup. For the above scenario,
using the default group_concat_max_len (which is a server
variable that sets the maximum length of the GROUP_CONCAT string)
value of 1mb, the buffer size would be
(1mb + 64kb + some additional metadata). If the user sets
group_concat_max_len to a higher value, say 3gb, this buffer
size would be ~3gb. Now if the runtime initiates several
instances of RowAggregation, total memory consumption by
PrimProc could exceed the hardware memory limits causing the
OS OOM to kill the process. We fix this problem by again
enabling the StringStore for the NULL row allocation.

3. In the plugin code in buildAggregateColumn(), there is
an integer overflow when the server group_concat_max_len
variable (which is an uint32_t) is set to a value > INT32_MAX
(such as 3gb) and is assigned to
CalpontSystemCatalog::ColType::colWidth (which is an int32_t).
As a short term fix, we saturate the assigned value to colWidth
to INT32_MAX. Proper fix would be to upgrade
CalpontSystemCatalog::ColType::colWidth to an uint32_t.
2023-05-01 13:06:23 -04:00
4fe9cd64a3 Revert "No boost condition (#2822)" (#2828)
This reverts commit f916e64927.
2023-04-22 15:49:50 +03:00
f916e64927 No boost condition (#2822)
This patch replaces boost primitives with stdlib counterparts.
2023-04-22 00:42:45 +03:00
3ce19abdae Options to build with TSAN, UBSAN and skipping smoke (#2826) 2023-04-21 21:24:48 +03:00
c2d0fa24da replace boost::shared_array<T> to std::shared_ptr<T[]> 2023-04-14 10:33:27 +00:00
a508b86091 remove boost/shared_array include 2023-04-14 09:42:50 +00:00
6c32c658d5 MCOL-5385: Delete RowGroup::setData and make Pointer ctor explicit (#2808)
* Delete RowGroup::setData and make Pointer ctor explicit

* some push_backs replaced with emplace_backs

* Fixes of review notes
2023-04-13 03:55:30 +03:00
2e1394149b MCOL-5464: Fixes of bugs from ASAN warnings, part one (#2792)
* Fixes of bugs from ASAN warnings, part one

* MQC as static library, with nifty counter for global map and mutex

* Switch clang to 16

* link messageqcpp to execplan
2023-04-04 02:33:23 +03:00
b53c231ca6 MCOL-271 empty strings should not be NULLs (#2794)
This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.

Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
2023-03-30 21:18:29 +03:00
256691652d MCOL-4530: toCppCode() method for ParseTree and TreeNode (#2777)
* toCppCode for ParseTree and TreeNode

* generated tree is compiling

* Put tree constructors into tests

* Minor fixes

* Fixed parse + some constructors

* Fixed includes, removed debug and old data

* Hopefully fix clang errors

* Forgot an override

* More overrides
2023-03-22 23:25:06 +03:00
70124ecc01 Fix trivial spelling errors
- occured -> occurred
- reponse -> response
- seperated -> separated

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
2023-03-11 11:59:47 -08:00
786b9da5b0 MCOL-5438 COUNT() in math causes SEGV 2023-03-09 20:35:38 +00:00
56f2346083 Remove windows ifdefs 2023-03-02 15:59:42 +00:00
4d4e4ad30d Merge pull request #2741 from mariadb-corporation/MDEV-25080-CS-dev
MDEV-25080 Allow pushdown of queries involving UNIONs in outer select to ColumnStore
2023-02-28 11:23:50 +00:00
b6808c97f1 MCOL-4530: common conjuction top rewrite (#2673)
Added logical transformation of the execplan::ParseTrees with the taking out the common factor in expression of the form "(A and B) or (A and C)" for the purposes of passing a TPCH 19 query.

Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>
2023-02-27 19:23:19 +03:00
2f1f9c0ef0 MDEV-25080 Some fixes:
1. In TupleUnion::writeNull(), add the missing switch case for
   wide decimal with 16bytes column width.
2. MCOL-5432 Disable complete/partial pushdown of UNION operation
   if the query involves an ORDER BY or a LIMIT clause, until
   MCOL-5222 is fixed. Also add MTR test cases for this.
2023-02-27 06:38:31 -05:00
e4100928d1 MDEV-25080 DISABLE pushdown of SELECT_LEX_UNIT for the prepare
phase of PS/SP statements.
2023-02-27 06:38:31 -05:00
86dcf92d56 MCOL-5215 Fix overflow of UNION operation involving DECIMAL datatypes.
When a UNION operation involving DECIMAL datatypes with scale and digits
before the decimal exceeds the currently supported maximum precision
of 38, we throw an error to the user:
"MCS-2060: Union operation exceeds maximum DECIMAL precision of 38".

This is until MCOL-5417 is implemented where ColumnStore will have
full parity with MariaDB server in terms of maximum supported DECIMAL
precision and scale of 65 and 38 digits respectively.
2023-02-27 06:38:31 -05:00
8cdcae0d2f MDEV-25080 Disable pushdown of SELECT_LEX_UNIT for CREATE VIEW statements. 2023-02-27 06:38:31 -05:00
45a779f743 MDEV-25080 Implement ColumnStore-side changes for pushdown of SELECT_LEX_UNITs. 2023-02-27 06:38:31 -05:00
ff534dba7f MCOL-5384 This commit replaces shared pointer to CSC with CSC ctor that is cleaned up leaving a scope
CSC default ctor was private b/c it must not allow to use CSC outside thread cache.
  However there are some places in the plugin code that need a standalone syscat that
  is cleaned up leaving the scope. The decision is to make the restriction mentioned
  organizational rather than syntactical.
2023-02-08 14:03:41 +00:00
c1168e33aa Merge pull request #2712 from drrtuy/MCOL-5400
MCOL-5400 Disable group by pushdown
2023-01-27 16:17:24 +03:00
d87206c3e4 Fix segfault in getLocalNetIfacesSins (#2713) 2023-01-26 16:21:21 +03:00