mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00

Author	SHA1	Message	Date
Leonid Fedorov	83c2408f8d	fix(join, threadpool): MCOL-5565: MCOL-5636: MCOL-5645: port from develop-23.02 to [develop] (#3128 ) * fix(threadpool): MCOL-5565 queries stuck in FairThreadScheduler. (#3100) Meta Primitive Jobs, .e.g ADD_JOINER, LAST_JOINER stuck in Fair scheduler without out-of-band scheduler. Add OOB scheduler back to remedy the issue. * fix(messageqcpp): MCOL-5636 same node communication crashes transmiting PP errors to EM b/c error messaging leveraged socket that was a nullptr. (#3106) * fix(threadpool): MCOL-5645 errenous threadpool Job ctor implictly sets socket shared_ptr to nullptr causing sigabrt when threadpool returns an error (#3125) --------- Co-authored-by: drrtuy <roman.nozdrin@mariadb.com>	2024-02-13 19:01:16 +03:00
Sergey Zefirov	ebcf43a517	fix(join, disk-based): MCOL-5597: large side read errors (#3117 ) The large side read errors mentioned there can be due to failure to close file stream properly. Some of the data may still reside in the file stream buffers, closing must flush it. The flush is an I/O operation and can fail, leading to partial write and subsequent partial read. This patch tries to provide better diagnostics.	2024-02-09 22:25:43 +03:00
Leonid Fedorov	0d1c72a563	compilation fix for gcc12 on known gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105329	2024-01-04 11:43:03 +03:00
Leonid Fedorov	4d7a6a0be5	perf(primproc) MCOL-5601: Initilize two fields once in ctor instead of calling makeConfig std::string fTmpDir = config::Config::makeConfig()->getTempFileDir(config::Config::TempDirPurpose::Aggregates); std::string fCompStr = config::Config::makeConfig()->getConfig("RowAggregation", "Compression");	2023-12-19 15:25:19 +03:00
Leonid Fedorov	9d5ad925eb	fix(linkage) link libm to libmarias3	2023-12-18 14:10:14 +03:00
Serguey Zefirov	1f958c9ed2	MCOL-5625: Fixes json_query implementation Also extends func_json_value.test.	2023-12-12 15:45:03 +03:00
Denis Khalikov	865cca11c9	MCOL-5505 Add TypeHandler functions.	2023-11-30 01:47:13 +04:00
HanpyBin	fe597ec78c	MCOL-5505 add parquet support for cpimport and add mcs_parquet_ddl and mcs_parquet_gen tools	2023-11-30 01:47:13 +04:00
Serguey Zefirov	792aea2a7c	Fixes MCOL-5599 where LIKE operator never finishes This is a fix of logging subsystem, nothing else. The old code expanded an argument into string and advanced too little and, if expansion contained argument's index, it expanded it again. And again.	2023-11-29 19:17:16 +04:00
Sergey Zefirov	8632c85ecf	feat(primproc,aggregegation)!: Changes for ROLLUP with single-phase aggregation (#3025 ) The fix is simple: enable subtotals in single-phase aggregation and disable parallel processing when there are subtotals and aggregation is single-phase.	2023-11-28 17:33:02 +03:00
Denis Khalikov	76e4e13b80	fix(rowgroup,stringstore): MCOL-5597 Set length for `nullptr` string to 0. (#3027 )	2023-11-28 17:18:52 +03:00
Sergey Zefirov	5c9770d1e6	fix(funcexp): MCOL-5607: JSON function use crashes query execution (#3028 ) JSON functions were implemented violating an assumption of their pureness, as they should not have any state. This concrete patch fixes implementation of JSON_VALUE function.	2023-11-21 23:46:03 +03:00
Sergey Zefirov	69b8e1c779	feat(extent-elimination)!: re-enable extent-elimination for dictionary columns scanning This is "productization" of an old code that would enable extent elimination for dictionary columns. This concrete patch enables it, fixes perfomance degradation (main problem with old code) and also fixes incorrect behavior of cpimport.	2023-11-17 17:14:35 +03:00
drrtuy	67c842e792	Merge pull request #3017 from drrtuy/fix/MCOL-5472-urandom-mutex fix(rowstorage): MCOL-5472 SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock	2023-11-02 16:10:57 +02:00
Roman Nozdrin	dfc9e89496	fix(rowstorage): SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock	2023-11-01 18:19:45 +00:00
Roman Nozdrin	f7045457f2	chore(datatypes): refactoring math ops results domain check functionality	2023-10-25 09:12:54 +00:00
Roman Nozdrin	eb744eafed	chore(datatypes): this refactors the placement of the main SQL data types enum to enable templates that are parametrized with this enum(see mcs_datatype_basic.h changes for more details).	2023-10-24 18:44:35 +03:00
Leonid Fedorov	1f71847e99	fix(packaging) dh_missing: warning are treated as errors for buildbot debians dh_missing: warning: Compatibility levels before 10 are deprecated (level 9 in use) dh_missing: warning: usr/lib/x86_64-linux-gnu/libmessageqcpp.a exists in debian/tmp but is not installed to anywhere dh_missing: warning: usr/lib/x86_64-linux-gnu/libpron.a exists in debian/tmp but is not installed to anywhere so do not install static libraries as targets on CMake	2023-10-04 13:20:24 -04:00
Leonid Fedorov	86c1c5d537	fix(rgdata)!: Fix assertion failure leading to disk-based aggregation failure The new added invariant checking that RGData knows the number of columns and fixed size columns was failing for disk-based aggregation workloads, leading them to provide a wrong result. (The assertion failure happened in RGData::getRow(uint32_t num, Row* row) which is called in the finalization of sub-aggregation results, necessary for merging part results. As the merging failed, duplicate results were output for disk-based aggregation queries. The assertion failure was caused by RGData::deserialize(ByteStream& bs, uint32_t defAmount) not setting rowSize and colCount if necessary (e.g. when the deserialization happens into a new, default RGData, which doesn't know anything about its structure yet. This is the case when the default constructor for RGData() is used, which sets rowSize and columnCount to 0 each. There are three code parts that make use of the default RGData() ctor. The fix is for the use in RowGroupStorage::loadRG(uint64_t rgid, std::unique_ptr<RGData>& rgdata, bool unlinkDump = false), where the default RGData object is used to directly deserialize a ByteStream into it. The deserialize method now checks if both rowSize and columnCount are 0 and if yes sets the read values from the ByteStream for both. We should probably check the other two code parts making use of the default RGData ctor, too. This happens in joinpartition.cpp and tuplejoiner.cpp. --------- Co-authored-by: Theresa Hradilak <34538290+phoeinx@users.noreply.github.com>	2023-09-30 00:02:31 +03:00
Sergey Zefirov	920607520c	feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support Adds a special column which helps to differentiate data and rollups of various depts and a simple logic to row aggregation to add processing of subtotals.	2023-09-26 17:01:53 +03:00
mariadb-AlexeyVorovich	fd94ab5042	chore(logging): move cgroup /cgroup version log from constructor to getTotalMemory to avoid duplicate log as constructor is called per query	2023-09-25 22:17:09 +03:00
Gagan Goel	7f9c624626	MCOL-5573 Fix cpimport truncation of TEXT columns. 1. Restore the utf8_truncate_point() function in utils/common/utils_utf8.h that I removed as part of the patch for MCOL-4931. 2. As per the definition of TEXT columns, the default column width represents the maximum number of bytes that can be stored in the TEXT column. So the effective maximum length is less if the value contains multi-byte characters. However, if the user explicitly specifies the length of the TEXT column in a table DDL, such as TEXT(65535), then the DDL logic ensures that enough number of bytes are allocated (upto a system maximum) to allow upto that many number of characters (multi-byte characters if the charset for the column is multi-byte, such as utf8mb3).	2023-09-20 12:23:22 -04:00
Gagan Goel	931f2b36a1	MCOL-4931 Make cpimport charset-aware. (#2938 ) 1. Extend the following CalpontSystemCatalog member functions to set CalpontSystemCatalog::ColType::charsetNumber, after the system catalog update to add charset number to calpontsys.syscolumn in MCOL-5005: CalpontSystemCatalog::lookupOID CalpontSystemCatalog::colType CalpontSystemCatalog::columnRIDs CalpontSystemCatalog::getSchemaInfo 2. Update cpimport to use the CHARSET_INFO object associated with the charset number retrieved from the system catalog, for a dictionary/non-dictionary CHAR/VARCHAR/TEXT column, to truncate long strings that exceed the target column character length. 3. Add MTR test cases.	2023-09-05 17:17:20 +03:00
mariadb-AlexeyVorovich	5b4f06bf0d	Logging of memory (#2930 ) * -logging of memory WIP * -better log for cgroup case * -fix log * -display in GIB * add log for freememory for non CGROUP (to be discussed) * test repeated log entries * -added counter for every 1000 call. effectivly 15m * Name logginng period and inrease it, clear config files from PR, add .gitignore --------- Co-authored-by: pgmabv99 <alexey.vorovich@gmail.com> Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>	2023-09-05 15:46:29 +03:00
drrtuy	765dd46b61	fix(pp-threadpool): the workaround for a stuck tests001 in CI (#2931 ) CI ocassionaly stuck running test001 b/c PP threadpool endlessly reschedules meta jobs, e.g. BATCH_PRIMITIVE_CREATE, which ByteStreams were somehow damaged or read out. Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>	2023-08-18 00:02:31 +03:00
Theresa Hradilak	48562e41f9	feat(datatypes): MCOL-4632 and MCOL-4648, fix cast leads to NULL. Remove redundant cast. As C-style casts with a type name in parantheses are interpreted as static_casts this literally just changes the interpretation around (and forces an implicit cast to match the return value of the function). Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency. Make consistent with relation between BIGINTNULL and BIGINTEMPTYROW & make adapted cast behaviour due to NULL markers more intuitive. (After this change we can simply block the highest possible uint64_t value and if a cast results in it, print the next lower value (2^64 - 2). Previously, (2^64 - 1) was able to be printed, but (2^64 - 2) as being blocked by the UBIGINTNULL constant was not, making finding the appropiate replacement value to give out more confusing. Introduce MAX_MCS_UBIGINT and MIN_MCS_BIGINT and adapt casts. Adapt casting to BIGINT to remove NULL marker error. Add bugfix regression test for MCOL 4632 Add regression test for mcol_4648 Revert "Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency." This reverts commit 83eac11b18937ecb0b4c754dd48e4cb47310f620. Due to backwards compatability issues. Refactor casting to MCS[U]Int to datatype functions. Update regression tests to include other affected datatypes. Apply formatting. Refactor according to PR review Remove redundant new constant, switch to using already existing constant. Adapt nullstring casting to EMPTYROW markers for backwards compatability. Adapt tests for backward compatability behaviour allowing text datatypes to be casted to EMPTYROW constant. Adapt mcol641-functions test according to bug fix. Update tests according to new expected behaviour. Adapt tests to new understanding of issue. Update comments/documentation for MCOL_4632 test. Adapt to new cast limit logic. Make bracketing consistent. Adapt previous regression test to new expected behaviour.	2023-08-11 13:00:30 +00:00
drrtuy	1a49a09af3	Merge pull request #2878 from denis0x0D/MCOL-5514_dev_1 MCOL-5514 Parallel disk join step	2023-07-25 14:32:13 +01:00
Leonid Fedorov	65cde8c894	feature: pron (#2908 ) * feature: Special dictionary, we can pass with session veriable to modify codepaths and behaviour for testing and debugging	2023-07-21 14:02:03 +03:00
Leonid Fedorov	8d06822be5	atomic stop flag	2023-07-12 18:17:13 +03:00
Leonid Fedorov	bab29ff495	Simpler Config	2023-07-12 18:15:26 +03:00
Denis Khalikov	2a66ae2ed1	MCOL-5514 Parallel disk join step.	2023-07-11 14:05:14 +03:00
Sergei Golubchik	ebfb9face2	compiler failures with gcc 12.x a workaround for something that looks like a bug in a compiler. Fixes errors like In file included from /usr/include/c++/12/string:40, from /mnt/server/storage/columnstore/columnstore/utils/funcexp/func_math.cpp:26: In static member function ‘static constexpr std::char_traits<char>::char_type* std::char_traits<char>::copy(char_type, const char_type, std::size_t)’, inlined from ‘static constexpr void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT, const _CharT, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:423:21, inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Allocator>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.tcc:532:22, inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::replace(size_type, size_type, const _CharT, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:2171:19, inlined from ‘constexpr std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::insert(size_type, const _CharT) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std::allocator<char>]’ at /usr/include/c++/12/bits/basic_string.h:1928:22, inlined from ‘virtual std::string funcexp::Func_format::getStrVal(rowgroup::Row&, funcexp::FunctionParm&, bool&, execplan::CalpontSystemCatalog::ColType&)’ at /mnt/server/storage/columnstore/columnstore/utils/funcexp/func_math.cpp:2008:17: /usr/include/c++/12/bits/char_traits.h:431:56: error: ‘void __builtin_memcpy(void, const void, long unsigned int)’ accessing 9223372036854775810 or more bytes at offsets 3 and [2, 2147483645] may overlap up to 9223372036854775813 bytes at offset -3 [-Werror=restrict] 431 \| return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n ); $ gcc --version gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0	2023-07-04 12:58:18 -04:00
drrtuy	6d44d2e850	MCOL-5500 Remove another noisy printout. (#2886 ) Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>	2023-06-29 15:46:00 +03:00
drrtuy	385e580e9c	MCOL-5175 Bump overlooked libmariaS3 and added more ENV variables to fine tune initial installation on S3 (#2870 ) Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>	2023-06-27 17:33:39 +03:00
Denis Khalikov	2aba28d855	Merge pull request #2851 from denis0x0D/MCOL-5477 MCOL-5477 Disk join step improvement.	2023-06-26 11:02:20 +03:00
Denis Khalikov	1f190a6e75	MCOL-5477 Disk join step improvement. This patch: 1. Handles corner case when the bucket exceeded the memory limit, but we cannot redistribute the data in this bucket into new buckets based on a hash algorithm, because the rows have the same values. 2. Adds force option for disk join step. 3. Add a option to contol the depth of the partition tree.	2023-06-23 18:40:15 +03:00
Roman Nozdrin	79b636d853	MCOL-5500 Remove noisy printout from CGroupConfigurator method	2023-06-19 11:24:41 +00:00
Roman Nozdrin	375d162376	MCOL-5500 This patch adds cgroup v2 support with some sanity checks for (#2849 ) values reported by cgroups v1	2023-06-09 17:37:21 +03:00
Roman Nozdrin	cacbbee1c2	MCOL-5175 Increase the maximum effective length of S3 secret used as SHA256 key producing S3 signature (#2859 ) Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>	2023-06-07 15:24:02 +03:00
Leonid Fedorov	77eedd1756	MCOL-5503: Fix broken quarter (#2855 ) * Fix broken quarter function	2023-06-02 18:06:58 +03:00
Leonid Fedorov	8f93fc3623	MCOL-5493: First portion of UBSan fixes (#2842 ) Multiple UB fixes	2023-06-02 17:02:09 +03:00
Sergey Zefirov	0a2e9760ee	Fix for JSON_VALUE function to remove OOB stack access (#2852 ) MCOL-271 introduced a bug in JSON_VALUE that was discovered during implementation of ASAN builds. The changes here restore normal functionality. In short, changes in MCOL-271 introduced a local variable instead of reference to a string in ConstantColumn's fResult.strVal. The handling of ConstantColumn is different because ConstantColumn's value is used to initialize JSON path once. JSON path value holds pointer to data it does not own and if there are two or more rows the data can be corrupted and/or be out of stack bounds. The changes here introduce reference to a NullString that is held in the ConstantColumn's fResult.strVal and uses appropriate functions to obtain data from the NullString. CC's fResult is held by CC and strVal is also neither changing nor moving during operation, which allow JSON path to hold correct pointers during multi-row operation.	2023-05-31 15:30:40 +03:00
Leonid Fedorov	f18c556311	Fix gcc-13 warning and add support for building Fedora (#2845 )	2023-05-26 16:30:53 +03:00
Roman Nozdrin	e6e74c0be7	MCOL-5437 Fixes to follow the charset_info api change introduced by MDEV-30661	2023-05-08 18:57:36 +00:00
Roman Nozdrin	176eefcc57	MCOL-5456 codebase preparation for colab with CS 11.1	2023-05-08 18:54:47 +00:00
Gagan Goel	0be1c3dc8f	MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. 1. Input and output RowGroup's used in GROUP_CONCAT classes are currently allocating a raw memory buffer of size equal to the actual width of the string datatype. As an example, for the following query: SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1; If col2 is a TEXT field with default width, the input RowGroup containing the target rows to be concatenated will assign 64kb of memory for every input row in the RowGroup. This is wasteful as actual field values in real workloads would be much smaller. We fix this by enabling the RowGroup to use the StringStore when the RowGroup contains long strings. 2. RowAggregation::initialize() allocates a memory buffer for a NULL row. The size of this buffer is equal to the row size for the output RowGroup. For the above scenario, using the default group_concat_max_len (which is a server variable that sets the maximum length of the GROUP_CONCAT string) value of 1mb, the buffer size would be (1mb + 64kb + some additional metadata). If the user sets group_concat_max_len to a higher value, say 3gb, this buffer size would be ~3gb. Now if the runtime initiates several instances of RowAggregation, total memory consumption by PrimProc could exceed the hardware memory limits causing the OS OOM to kill the process. We fix this problem by again enabling the StringStore for the NULL row allocation. 3. In the plugin code in buildAggregateColumn(), there is an integer overflow when the server group_concat_max_len variable (which is an uint32_t) is set to a value > INT32_MAX (such as 3gb) and is assigned to CalpontSystemCatalog::ColType::colWidth (which is an int32_t). As a short term fix, we saturate the assigned value to colWidth to INT32_MAX. Proper fix would be to upgrade CalpontSystemCatalog::ColType::colWidth to an uint32_t.	2023-05-01 13:06:23 -04:00
Roman Nozdrin	4fe9cd64a3	Revert "No boost condition (#2822 )" (#2828 ) This reverts commit `f916e64927`.	2023-04-22 15:49:50 +03:00
Leonid Fedorov	f916e64927	No boost condition (#2822 ) This patch replaces boost primitives with stdlib counterparts.	2023-04-22 00:42:45 +03:00
Leonid Fedorov	c2d0fa24da	replace boost::shared_array<T> to std::shared_ptr<T[]>	2023-04-14 10:33:27 +00:00
Leonid Fedorov	a508b86091	remove boost/shared_array include	2023-04-14 09:42:50 +00:00

1 2 3 4 5 ...

1152 Commits