mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00

Author	SHA1	Message	Date
Leonid Fedorov	c6e9b7d448	MCOL-5624: dont force columnstore_use_import_for_batchinsert option to be required to start mariadb server (#3078 )	2023-12-26 15:12:01 +04:00
Leonid Fedorov	fadb102712	fix(writeengine) MCOL-4202: use schema name when renaming table and change it's fields in syscat	2023-12-18 09:59:38 +03:00
Denis Khalikov	74c1a38f2c	fix(disk-based-join): MCOL-5626 Fix for race in DJS with outer join. (#3064 )	2023-12-15 11:20:27 +03:00
Denis Khalikov	9119f6f7b8	fix(aggregate): MCOL-5467 Add support for duplicate expressions in group by. (#3045 ) This patch adds support for duplicate expressions (builtin_functions) with one argument in select statement and group by statement.	2023-12-05 15:04:53 +03:00
Sergey Zefirov	8632c85ecf	feat(primproc,aggregegation)!: Changes for ROLLUP with single-phase aggregation (#3025 ) The fix is simple: enable subtotals in single-phase aggregation and disable parallel processing when there are subtotals and aggregation is single-phase.	2023-11-28 17:33:02 +03:00
Sergey Zefirov	9a84aa8d99	fix(plugin): Same columns fom different views in GROUP BY do not produce errors (#3035 ) Fixes MCOL-5643. The problem was that different views with same column names in GROUP BY and on the SELECT clause produced an error about "projection column is not an aggergate neither in GROUP BY list." This was due to incorrect search in expressions's list that lead to duplicate columns in GROUP BY list.	2023-11-28 17:30:56 +03:00
drrtuy	26f5f8fe5c	fix(plugin): this is to addres the original patch QA found in the original patch	2023-11-22 17:20:37 +03:00
Sergey Zefirov	69b8e1c779	feat(extent-elimination)!: re-enable extent-elimination for dictionary columns scanning This is "productization" of an old code that would enable extent elimination for dictionary columns. This concrete patch enables it, fixes perfomance degradation (main problem with old code) and also fixes incorrect behavior of cpimport.	2023-11-17 17:14:35 +03:00
drrtuy	f5ff63b52f	Merge pull request #3011 from mariadb-corporation/MCOL-4740-2 MCOL-4740: This fixes update rows counter for multi-table update	2023-11-07 21:30:57 +02:00
Roman Nozdrin	6579180810	fix(plugin): MCOL-4740: This fixes update rows counter for multi-table update For UPDATEs involving a single table, the server call to handler::direct_update_rows() is used to correctly set the count for the number of updated rows in the UPDATE statement. However, for UPDATEs involving multi-tables, the server does not call handler::direct_update_rows(). This patch adds support to correctly report the number of updated rows to the client by setting multi_update::updated and multi_update::found in handler::rnd_end().	2023-11-02 14:18:06 +00:00
Roman Nozdrin	f7045457f2	chore(datatypes): refactoring math ops results domain check functionality	2023-10-25 09:12:54 +00:00
Roman Nozdrin	eb744eafed	chore(datatypes): this refactors the placement of the main SQL data types enum to enable templates that are parametrized with this enum(see mcs_datatype_basic.h changes for more details).	2023-10-24 18:44:35 +03:00
drrtuy	242408f751	fix(datatypes, funcexp): static_cast typo fix (#3001 )	2023-10-17 23:58:59 +03:00
Sergey Zefirov	84148cbe4c	fix(datatypes, funcexp): Overflow detection for MCOL-5568 use case (and some other) (#2987 ) We add intermediate calculations in int128_t when target is UBIGINT and check for overflow before converting into the UBIGINT. This is so because we can overflow on addition and multiplication, with (some) signed operands or both unsigned.	2023-10-16 16:55:02 +03:00
Gagan Goel	320df831c6	MCOL-5572 Force the charset on the autoincrement column of (#2976 ) calpontsys.syscolumn syscat table to be latin1. This change is done in one of the ctors of pColStep which is initiated while building the job list from the execution plan.	2023-09-28 22:03:39 +03:00
Sergey Zefirov	920607520c	feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support Adds a special column which helps to differentiate data and rollups of various depts and a simple logic to row aggregation to add processing of subtotals.	2023-09-26 17:01:53 +03:00
Leonid Fedorov	5013717730	fix(plugin): Fix wrong ask for stat call for table mode	2023-09-26 14:43:06 +03:00
Sergey Zefirov	4bfce51628	Fix autoincrement filtering problems with utf-8 (#2964 ) MCOL-5572: Widen the autoincrement column to accomodate utf-8 encoded into weights with strnxfrm function.	2023-09-22 16:40:10 +03:00
Leonid Fedorov	8171e9da07	Fix rocky-8 vanilla compiler build (#2959 ) Co-authored-by: Leonid Fedorov <leonid.fedorov@mariad.com>	2023-09-20 04:04:08 +03:00
Leonid Fedorov	1c9cd9db9f	Fix garbage charset using ColType(int32_t colWidth_, int32_t scale_, int32_t precision_, (#2949 ) const ConstraintType& constraintType_, const DictOID& ddn_, int32_t colPosition_, int32_t compressionType_, OID columnOID_, const ColDataType& colDataType_);	2023-09-06 20:01:31 +03:00
Gagan Goel	931f2b36a1	MCOL-4931 Make cpimport charset-aware. (#2938 ) 1. Extend the following CalpontSystemCatalog member functions to set CalpontSystemCatalog::ColType::charsetNumber, after the system catalog update to add charset number to calpontsys.syscolumn in MCOL-5005: CalpontSystemCatalog::lookupOID CalpontSystemCatalog::colType CalpontSystemCatalog::columnRIDs CalpontSystemCatalog::getSchemaInfo 2. Update cpimport to use the CHARSET_INFO object associated with the charset number retrieved from the system catalog, for a dictionary/non-dictionary CHAR/VARCHAR/TEXT column, to truncate long strings that exceed the target column character length. 3. Add MTR test cases.	2023-09-05 17:17:20 +03:00
Denis Khalikov	add3a57e8d	MCOL-5539 Put table on small side if it was involved in prev.join. (#2945 )	2023-09-05 12:19:43 +03:00
Andrey Piskunov	d586975da7	Rename a limit var + change error message (#2946 ) * Rename a limit var + change error message * Adjust the test	2023-09-05 12:19:15 +03:00
mariadb-AndreyPiskunov	05547f2342	Add a limit (as runtime value) for long in queries	2023-08-21 10:38:46 +03:00
mariadb-AndreyPiskunov	6ff121a91c	Replace recursion with iteration in ParseTree (and some related walkers)	2023-08-21 10:36:41 +03:00
drrtuy	f55d41c079	Merge pull request #2912 from tntnatbry/MCOL-5005 MCOL-5005 Add charset number to system catalog.	2023-08-15 22:22:21 +02:00
Gagan Goel	d50a0fa2e6	MCOL-5005 Add charset number to system catalog - Part 2. 1. Extend the calpontsys.syscolumn system catalog table with a new column, 'charsetnum'. 'charsetnum' field is set to the 'number' member of the 'charset_info_st' struct defined in the server in m_ctype.h. For CHAR/VARCHAR/TEXT column types, 'charset_info_st' is initialized to the charset/collation of the column, which is set at the column-level or at the table-level in the DDL. For BLOB/VARBINARY binary column types, 'charset_info_st' is initialized to my_charset_bin (charsetnum=63). For all other column types, charsetnum is set to 0. 2. Add support for the newly added 'charsetnum' column in the automatic system catalog upgrade logic in dbbuilder. For existing table definitions, charsetnum for the column is defaulted to 0. 3. Add MTR test case that creates a few table definitions with a range of charset/collation combinations and queries the calpontsys.syscolumn system catalog table with the charsetnum field for the columns in the table DDLs.	2023-08-15 17:21:47 +00:00
mariadb-AlexeyVorovich	64f1d541d0	MCOL-5519: new defaults in columnstore.cnf (#2894 ) feat(charset)!: utf8 is a new charset default and utf8_general_ci is a new collation default in the engine configuration file shipped --------- Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com> Co-authored-by: mariadb-DanielLee <daniel.lee@mariadb.com>	2023-08-15 18:04:32 +03:00
Gagan Goel	712d34a407	MCOL-4988 Table lock remained after DML failure due to DBRM in read-only mode. DMLProcessor functor earlier did not check if the DBRM was in read-only mode. This allowed DML statements to continue execution to the point where it locks the table and then sends the statement down to the WriteEngineServer, which ultimately returns back in an error state to DMLProc when it fails to perform BRM updates due to DBRM in read-only mode. This caused a lingering table lock in the system which could only be cleared on a system restart. As a fix, we add a check in the DMLProcessor functor to detect if DBRM is in read only mode, and if so, return back early in the execution of the DML statement.	2023-08-15 10:25:27 -04:00
Theresa Hradilak	48562e41f9	feat(datatypes): MCOL-4632 and MCOL-4648, fix cast leads to NULL. Remove redundant cast. As C-style casts with a type name in parantheses are interpreted as static_casts this literally just changes the interpretation around (and forces an implicit cast to match the return value of the function). Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency. Make consistent with relation between BIGINTNULL and BIGINTEMPTYROW & make adapted cast behaviour due to NULL markers more intuitive. (After this change we can simply block the highest possible uint64_t value and if a cast results in it, print the next lower value (2^64 - 2). Previously, (2^64 - 1) was able to be printed, but (2^64 - 2) as being blocked by the UBIGINTNULL constant was not, making finding the appropiate replacement value to give out more confusing. Introduce MAX_MCS_UBIGINT and MIN_MCS_BIGINT and adapt casts. Adapt casting to BIGINT to remove NULL marker error. Add bugfix regression test for MCOL 4632 Add regression test for mcol_4648 Revert "Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency." This reverts commit 83eac11b18937ecb0b4c754dd48e4cb47310f620. Due to backwards compatability issues. Refactor casting to MCS[U]Int to datatype functions. Update regression tests to include other affected datatypes. Apply formatting. Refactor according to PR review Remove redundant new constant, switch to using already existing constant. Adapt nullstring casting to EMPTYROW markers for backwards compatability. Adapt tests for backward compatability behaviour allowing text datatypes to be casted to EMPTYROW constant. Adapt mcol641-functions test according to bug fix. Update tests according to new expected behaviour. Adapt tests to new understanding of issue. Update comments/documentation for MCOL_4632 test. Adapt to new cast limit logic. Make bracketing consistent. Adapt previous regression test to new expected behaviour.	2023-08-11 13:00:30 +00:00
Denis Khalikov	896e8dd769	MCOL-5522 Properly process pm join result count. (#2909 ) This patch: 1. Properly processes situation when pm join result count is exceeded. 2. Adds session variable 'columnstore_max_pm_join_result_count` to control the limit.	2023-08-04 16:55:45 +03:00
Gagan Goel	4f580d109d	Fix a compiler error related to signed v/s unsigned integer comparison. (#2915 )	2023-08-04 16:54:40 +03:00
Gagan Goel	a36ea6dbb4	MCOL-5005 Add charset number to system catalog - Part 1. This patch improves/fixes the existing handling of CHARSET and COLLATION symbols in the ColumnStore DDL parser. Also, add fCollate and fCharsetNum member variables to the ddlpackage::ColumnType class.	2023-07-28 18:36:53 -04:00
drrtuy	1a49a09af3	Merge pull request #2878 from denis0x0D/MCOL-5514_dev_1 MCOL-5514 Parallel disk join step	2023-07-25 14:32:13 +01:00
Leonid Fedorov	65cde8c894	feature: pron (#2908 ) * feature: Special dictionary, we can pass with session veriable to modify codepaths and behaviour for testing and debugging	2023-07-21 14:02:03 +03:00
Denis Khalikov	2a66ae2ed1	MCOL-5514 Parallel disk join step.	2023-07-11 14:05:14 +03:00
Sergei Golubchik	ebfb9face2	compiler failures with gcc 12.x a workaround for something that looks like a bug in a compiler. Fixes errors like In file included from /usr/include/c++/12/string:40, from /mnt/server/storage/columnstore/columnstore/utils/funcexp/func_math.cpp:26: In static member function ‘static constexpr std::char_traits<char>::char_type* std::char_traits<char>::copy(char_type, const char_type, std::size_t)’, inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT, const _CharT, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:423:21, inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.tcc:532:22, inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::replace(size_type, size_type, const _CharT, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:2171:19, inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::insert(size_type, const _CharT) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:1928:22, inlined from ‘virtual std::string funcexp::Func_format::getStrVal(rowgroup::Row&, funcexp::FunctionParm&, bool&, execplan::CalpontSystemCatalog::ColType&)’ at /mnt/server/storage/columnstore/columnstore/utils/funcexp/func_math.cpp:2008:17: /usr/include/c++/12/bits/char_traits.h:431:56: error: ‘void __builtin_memcpy(void, const void, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets 3 and [2, 2147483645] may overlap up to 9223372036854775813 bytes at offset -3 [-Werror=restrict] 431 \| return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n ); $ gcc --version gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0	2023-07-04 12:58:18 -04:00
Leonid Fedorov	501da394ca	Replace std::set contains method with count to support Rocky/RHEL/Alma 8 where the std::set in the stock STL does not have contains method	2023-07-04 12:58:18 -04:00
Sergei Golubchik	a8be4a3787	compiler warnings like dbcon/joblist/batchprimitiveprocessor-jl.cpp:893:54: error: pointer used after ‘void operator delete [](void*, std::size_t)’ [-Werror=use-after-free] 893 \| joinResults.reset(new vector<uint32_t>[8192]); \| ^	2023-07-04 12:58:18 -04:00
Denis Khalikov	2aba28d855	Merge pull request #2851 from denis0x0D/MCOL-5477 MCOL-5477 Disk join step improvement.	2023-06-26 11:02:20 +03:00
Denis Khalikov	1f190a6e75	MCOL-5477 Disk join step improvement. This patch: 1. Handles corner case when the bucket exceeded the memory limit, but we cannot redistribute the data in this bucket into new buckets based on a hash algorithm, because the rows have the same values. 2. Adds force option for disk join step. 3. Add a option to contol the depth of the partition tree.	2023-06-23 18:40:15 +03:00
Denis Khalikov	024e6bd358	MCOL-5512 Fix for post join filter. This patch fixes certain situations where post join filter is not applying.	2023-06-09 11:15:05 +03:00
Roman Nozdrin	62dc392476	MCOL-5499 Enable ControlFlow for same node communication processing path to avoid DEC queue overloading (#2848 )	2023-06-07 15:41:59 +03:00
Leonid Fedorov	8f93fc3623	MCOL-5493: First portion of UBSan fixes (#2842 ) Multiple UB fixes	2023-06-02 17:02:09 +03:00
Gagan Goel	c598a9bbed	MCOL-5480 LOAD DATA INFILE incorrectly loads values for MEDIUMINT datatype. Internal memory representation of MEDIUMINT datatype uses 24 bits. This is true for both MariaDB server as well as ColumnStore. MCS plugin code uses TypeHandlerSInt24 and TypeHandlerUInt24 classes to respectively convert the binary representation of the signed and unsigned MEDIUMINT values passed by the server to the plugin. The plugin then outputs the text representation of these values into an open file descriptor which is piped to cpimport for the final load into the MCS db files. The TypeHandlerXInt24 classes were earlier incorrectly using WriteBatchField::ColWriteBatchXInt32() functions which operate on a 4 byte buffer. This resulted in incorrect parsing of MEDIUMINT values. As a fix, we implement WriteBatchField::ColWriteBatchXInt24() functions which correctly handle the 24 bit input buffer used for MEDIUMINT datatype.	2023-05-23 16:00:05 -04:00
Roman Nozdrin	de7ba854bd	Merge pull request #2840 from tntnatbry/MCOL-5491 MCOL-5491 Enable StringStore for long strings in JSON_ARRAYAGG processing.	2023-05-17 12:12:03 +01:00
Gagan Goel	87eb875379	MCOL-5491 Enable StringStore for long strings in JSON_ARRAYAGG processing. This patch is the JSON_ARRAYAGG clone of the changes done in MCOL-5429 where we enabled usage of StringStore for long strings in GROUP_CONCAT() processing to reduce memory footprint of PrimProc and thus avoiding a potential OS triggered OOM crash.	2023-05-12 19:45:02 +00:00
Gagan Goel	1477b28ee9	MCOL-5357 Fix TPC-DS query error "MCS-3009: Unknown column '.<colname>'". For the following query: select item from ( select item from (select a as item from t1) tt union all select item from (select a as item from t1) tt ) ttt; There is an if predicate in buildSimpleColFromDerivedTable() that compares the outermost query field name (ttt.item) to the returned column list of the inner query (tt.item) when building the returned column list of the outer most query. In the above query example, the inner query field name is an alias set in the inner most query and is set to "`tt`.`item`", while the outermost query field name is set to "item". The use of backticks "`" in the inner query alias is causing the execution to not enter the if block which creates the SimpleColumn for the outermost query field name. As a fix, we strip off the backticks from the inner query alias.	2023-05-03 16:06:20 +00:00
Gagan Goel	0be1c3dc8f	MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. 1. Input and output RowGroup's used in GROUP_CONCAT classes are currently allocating a raw memory buffer of size equal to the actual width of the string datatype. As an example, for the following query: SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1; If col2 is a TEXT field with default width, the input RowGroup containing the target rows to be concatenated will assign 64kb of memory for every input row in the RowGroup. This is wasteful as actual field values in real workloads would be much smaller. We fix this by enabling the RowGroup to use the StringStore when the RowGroup contains long strings. 2. RowAggregation::initialize() allocates a memory buffer for a NULL row. The size of this buffer is equal to the row size for the output RowGroup. For the above scenario, using the default group_concat_max_len (which is a server variable that sets the maximum length of the GROUP_CONCAT string) value of 1mb, the buffer size would be (1mb + 64kb + some additional metadata). If the user sets group_concat_max_len to a higher value, say 3gb, this buffer size would be ~3gb. Now if the runtime initiates several instances of RowAggregation, total memory consumption by PrimProc could exceed the hardware memory limits causing the OS OOM to kill the process. We fix this problem by again enabling the StringStore for the NULL row allocation. 3. In the plugin code in buildAggregateColumn(), there is an integer overflow when the server group_concat_max_len variable (which is an uint32_t) is set to a value > INT32_MAX (such as 3gb) and is assigned to CalpontSystemCatalog::ColType::colWidth (which is an int32_t). As a short term fix, we saturate the assigned value to colWidth to INT32_MAX. Proper fix would be to upgrade CalpontSystemCatalog::ColType::colWidth to an uint32_t.	2023-05-01 13:06:23 -04:00
Roman Nozdrin	4fe9cd64a3	Revert "No boost condition (#2822 )" (#2828 ) This reverts commit `f916e64927`.	2023-04-22 15:49:50 +03:00

1 2 3 4 5 ...

1618 Commits