mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-05 15:41:14 +03:00

Author	SHA1	Message	Date
Serguey Zefirov	6e995e2e80	fix: MCOL-5755: incorrect handling of BLOB (and TEXT) in GROUP BY BLOB fields did not work as grouping keys at all, they were assigned value NULL for any value, be it NULL or not. The fix is in the rowaggregation.cpp in the initMapping(), a switch/case branch was added to handle BLOB field copying there. Also, TEXT columns did not distinguish between NULL and empty string in the grouping algorithm, now they do. The fix is in the equals() function, now we specifically check for isNull() equality between values.	2024-07-11 11:03:05 +03:00
Sergey Zefirov	920607520c	feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support Adds a special column which helps to differentiate data and rollups of various depts and a simple logic to row aggregation to add processing of subtotals.	2023-09-26 17:01:53 +03:00
Leonid Fedorov	8f93fc3623	MCOL-5493: First portion of UBSan fixes (#2842 ) Multiple UB fixes	2023-06-02 17:02:09 +03:00
Gagan Goel	0be1c3dc8f	MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. 1. Input and output RowGroup's used in GROUP_CONCAT classes are currently allocating a raw memory buffer of size equal to the actual width of the string datatype. As an example, for the following query: SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1; If col2 is a TEXT field with default width, the input RowGroup containing the target rows to be concatenated will assign 64kb of memory for every input row in the RowGroup. This is wasteful as actual field values in real workloads would be much smaller. We fix this by enabling the RowGroup to use the StringStore when the RowGroup contains long strings. 2. RowAggregation::initialize() allocates a memory buffer for a NULL row. The size of this buffer is equal to the row size for the output RowGroup. For the above scenario, using the default group_concat_max_len (which is a server variable that sets the maximum length of the GROUP_CONCAT string) value of 1mb, the buffer size would be (1mb + 64kb + some additional metadata). If the user sets group_concat_max_len to a higher value, say 3gb, this buffer size would be ~3gb. Now if the runtime initiates several instances of RowAggregation, total memory consumption by PrimProc could exceed the hardware memory limits causing the OS OOM to kill the process. We fix this problem by again enabling the StringStore for the NULL row allocation. 3. In the plugin code in buildAggregateColumn(), there is an integer overflow when the server group_concat_max_len variable (which is an uint32_t) is set to a value > INT32_MAX (such as 3gb) and is assigned to CalpontSystemCatalog::ColType::colWidth (which is an int32_t). As a short term fix, we saturate the assigned value to colWidth to INT32_MAX. Proper fix would be to upgrade CalpontSystemCatalog::ColType::colWidth to an uint32_t.	2023-05-01 13:06:23 -04:00
Roman Nozdrin	4fe9cd64a3	Revert "No boost condition (#2822 )" (#2828 ) This reverts commit `f916e64927`.	2023-04-22 15:49:50 +03:00
Leonid Fedorov	f916e64927	No boost condition (#2822 ) This patch replaces boost primitives with stdlib counterparts.	2023-04-22 00:42:45 +03:00
Leonid Fedorov	c2d0fa24da	replace boost::shared_array<T> to std::shared_ptr<T[]>	2023-04-14 10:33:27 +00:00
Leonid Fedorov	a508b86091	remove boost/shared_array include	2023-04-14 09:42:50 +00:00
Leonid Fedorov	6c32c658d5	MCOL-5385: Delete RowGroup::setData and make Pointer ctor explicit (#2808 ) * Delete RowGroup::setData and make Pointer ctor explicit * some push_backs replaced with emplace_backs * Fixes of review notes	2023-04-13 03:55:30 +03:00
Leonid Fedorov	2e1394149b	MCOL-5464: Fixes of bugs from ASAN warnings, part one (#2792 ) * Fixes of bugs from ASAN warnings, part one * MQC as static library, with nifty counter for global map and mutex * Switch clang to 16 * link messageqcpp to execplan	2023-04-04 02:33:23 +03:00
Sergey Zefirov	b53c231ca6	MCOL-271 empty strings should not be NULLs (#2794 ) This patch improves handling of NULLs in textual fields in ColumnStore. Previously empty strings were considered NULLs and it could be a problem if data scheme allows for empty strings. It was also one of major reasons of behavior difference between ColumnStore and other engines in MariaDB family. Also, this patch fixes some other bugs and incorrect behavior, for example, incorrect comparison for "column <= ''" which evaluates to constant True for all purposes before this patch.	2023-03-30 21:18:29 +03:00
Roman Nozdrin	786b9da5b0	MCOL-5438 COUNT() in math causes SEGV	2023-03-09 20:35:38 +00:00
Leonid Fedorov	56f2346083	Remove windows ifdefs	2023-03-02 15:59:42 +00:00
Leonid Fedorov	123c345b40	remove winport	2023-03-02 15:37:11 +00:00
Leonid Fedorov	d2432f9bf6	get rid of pointers for 128 fields	2022-08-26 15:12:22 +00:00
mariadb-AndreyPiskunov	0863ecd279	Replace getBinaryField	2022-08-25 18:21:43 +03:00
Leonid Fedorov	3919c541ac	New warnfixes (#2254 ) * Fix clang warnings * Remove vim tab guides * initialize variables * 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length * Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison * chars are unsigned on ARM, having if (ival < 0) always false * chars are unsigned by default on ARM and comparison with -1 if always true	2022-02-17 13:08:58 +03:00
Leonid Fedorov	04752ec546	clang format apply	2022-01-21 16:43:49 +00:00
Leonid Fedorov	01f3ceb437	replace header guards with #pragma once	2022-01-21 15:24:58 +00:00
Denis Khalikov	6393c6d019	MCOL-4810 Add support for missed operation for `longStrings`.	2021-10-28 10:02:02 +03:00
Roman Nozdrin	3de038c1da	MCOL-4876 This patch enables continues buffer to be used by ColumnCommand and aligns BPP::blockData that in most cases was unaligned	2021-10-06 09:23:40 +00:00
Alexey Antipovsky	6a4140394d	[MCOL-4829] More accurate memory counting	2021-09-07 19:52:20 +03:00
Denis Khalikov	7bda598fbf	MCOL-4810 Redundant copying and wasting memory in PrimProc This patch eliminates a copying `long string`s into the bytestream.	2021-08-26 12:16:23 +03:00
David Hall	1113470551	MCOL-4738 AVG gives wrong results with strict_aliasing A f fix that works with strict_aliasing	2021-07-07 13:08:32 -05:00
Alexander Barkov	8988253ff4	Merge pull request #2031 from mariadb-corporation/bar-develop-MCOL-4801 MCOL-4801 Replace Row methods getStringLength() and getStringPointer(…	2021-07-07 13:53:19 +04:00
David Hall	8332ab8974	MCOL-4738 AVG() returns a wrong result On AMD64 machines, the fpu is 80 bits. The unused bits must be masked for memcmp to work properly. For other archetectures, we don't want to mask those bits.	2021-07-06 19:50:00 -05:00
Alexander Barkov	9794f24369	MCOL-4801 Replace Row methods getStringLength() and getStringPointer() to getConstString()	2021-07-06 21:15:32 +04:00
Roman Nozdrin	8c360a1a27	MCOL-4759 Upmerge for MCOL-4564 code that implements hash merging family to reduce performance penalty using MDB hashing functions	2021-06-24 14:48:01 +00:00
Roman Nozdrin	bed0b7c6bc	MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI, functional JOINs based on top of TypelessData	2021-06-24 08:07:23 +00:00
Alexander Barkov	b3d6f62964	MCOL-4753 Performance problem in Typeless join	2021-06-10 09:26:26 +00:00
Alexey Antipovsky	475104e4d3	[MCOL-4709] Disk-based aggregation * Introduce multigeneration aggregation * Do not save unused part of RGDatas to disk * Add IO error explanation (strerror) * Reduce memory usage while aggregating * introduce in-memory generations to better memory utilization * Try to limit the qty of buckets at a low limit * Refactor disk aggregation a bit * pass calculated hash into RowAggregation * try to keep some RGData with free space in memory * do not dump more than half of rowgroups to disk if generations are allowed, instead start a new generation * for each thread shift the first processed bucket at each iteration, so the generations start more evenly * Unify temp data location * Explicitly create temp subdirectories whether disk aggregation/join are enabled or not	2021-06-06 16:09:15 +03:00
Alexander Barkov	bd4cbb542d	MCOL-4721 CHAR(1) is not collation-aware for GROUP/DISTINCT	2021-05-18 16:14:53 +04:00
Alexander Barkov	362bfcd15e	MCOL-4361 Replace pow(10.0, (double)scale) expressions with a static dictionary lookup.	2021-04-09 12:41:04 +04:00
Alexander Barkov	69911c2710	A joint patch for MCOL-4614, MCOL-4615, MCOL-4660 (decimal to string conversion) This patch fixes: - MCOL-4614 calShowPartitions() precision loss for huge narrow decimal - MCOL-4615 GROUP_CONCAT() precision loss for huge narrow decimal - MCOL-4660 Narow decimal to string conversion is inconsistent about zero integral Changes: - Implementing Row::getDecimalField() - Removing double arithmetic from the code printing DECIMAL values in TypeHandlerXDecimal::format64() and GroupConcator::outputRow(). Using Decimal::toString() instead. - Rewriting Decimal::toStringTSInt64(). The old implementation was wrong, too complex and slow (used unnecessary memmove, memcpy). An additional cleanup: - Removing the ENGINE=COLUMNSTORE clause from tests for MCOL-4532 and MCOL-4640 type_decimal.test is combinations-aware. It's run two times with default_storage_engine=MyISAM and default_storage_engine=COLUMNSTORE. So the CREATE TABLE statements should not specify the engine explicitly. - Adding --disable_warnings in the old fixed test. We needed to suppress warnings when the MyISAM combination is being run. Previously the table was erroneously created with ENGINE=COLUMNSTORE even with the MyISAM combination run. So warning were not generated.	2021-04-05 16:36:19 +04:00
Alexey Antipovsky	5080e1ae53	MCOL-4031 More accurate memory usage counting while sorting	2021-01-29 18:31:20 +03:00
Alexander Barkov	a687df48b9	MCOL-4065 DISTINCT is case sensitive This patch makes DISTINCT and GROUP BY collation aware.	2021-01-21 15:46:54 +04:00
Alexander Barkov	2ea73846b9	MCOL-4422 Remove mariadb.h and my_sys.h dependency from collation.h	2020-11-30 14:26:35 +04:00
Roman Nozdrin	3eb26c0d4a	MCOL-4313 Introduced TSInt128 that is a storage class for int128 Removed uint128 from joblist/lbidlist.* Another toString() method for wide-decimal that is EMPTY/NULL aware Unified decimal processing in WF functions Fixed a potential issue in EqualCompData::operator() for wide-decimal processing Fixed some signedness warnings	2020-11-18 13:53:15 +00:00
Alexander Barkov	d5c6645ba1	Adding mcs_basic_types.h For now it consists of only: using int128_t = __int128; using uint128_t = unsigned __int128; All new privitive data types should go into this file in the future.	2020-11-18 13:53:15 +00:00
Alexander Barkov	129d5b5a0f	MCOL-4174 Review/refactor frontend/connector code	2020-11-18 13:53:15 +00:00
Roman Nozdrin	844472d812	MCOL-4313 Very fragile but high speed approach with inline ASM GCC compiler uses aligned versions of SIMD instructions expecting aligned memory blocks that is hard to implement now	2020-11-18 13:52:20 +00:00
Roman Nozdrin	1c3a34a3d0	Dataconvert::decimalToString badly fails w/o 20th member of mcs_pow_10 so I returned it WF::percentile runtime threw an exception b/c of wrong DT deduced from its argument Replaced literals with constants Tought WF_sum_avg::checkSumLimit to use refs instead of values	2020-11-18 13:52:20 +00:00
David Hall	af80081c94	MCOL-4171 Some fixes	2020-11-18 13:52:20 +00:00
David Hall	c4d8516a47	MCOL-4171 Window functions with decimal(38)	2020-11-18 13:52:19 +00:00
David Hall	638202417f	MCOL-4171	2020-11-18 13:52:19 +00:00
Gagan Goel	d3bc68b02f	MCOL-641 Refactor initial extent elimination support. This commit also adds support in TupleHashJoinStep::forwardCPData, although we currently do not support wide decimals as join keys. Row estimation to determine large-side of the join is also updated.	2020-11-18 13:52:19 +00:00
Roman Nozdrin	bd0d5af123	Merge fixes.	2020-11-18 13:51:26 +00:00
Roman Nozdrin	17bad9eb0b	MCOL-641 Initial support for ORDER BY on wide DECIMALs.	2020-11-18 13:51:26 +00:00
Roman Nozdrin	51d77d74df	MCOL-641 Fix for GROUP BY on wide-DECIMALs.	2020-11-18 13:51:26 +00:00
Roman Nozdrin	f63611c422	MCOL-641 This commit adds support for group_concat w/o ORDER BY. Small refactoring in Row methods.	2020-11-18 13:51:26 +00:00

1 2

100 Commits