mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-01 06:21:41 +03:00

Author	SHA1	Message	Date
drrtuy	1c297b9e9e	feat(): dangling pointer/ref issue has been solved for both RGData and BS	2024-12-13 15:56:28 +00:00
drrtuy	937d09768b	feat(): propagate long strings SP type change	2024-12-04 23:45:05 +00:00
drrtuy	789a382be2	feat(): use boost::make_shared b/c most distros can't do allocate_shared for array types.	2024-12-03 22:40:53 +00:00
drrtuy	5383e7c5a2	feat(RGData,StringStore): add counting allocator capabilities to those ctors used in BPP::execute()	2024-11-30 18:51:29 +00:00
Gagan Goel	55d4214429	MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. (#2823 ) 1. Input and output RowGroup's used in GROUP_CONCAT classes are currently allocating a raw memory buffer of size equal to the actual width of the string datatype. As an example, for the following query: SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1; If col2 is a TEXT field with default width, the input RowGroup containing the target rows to be concatenated will assign 64kb of memory for every input row in the RowGroup. This is wasteful as actual field values in real workloads would be much smaller. We fix this by enabling the RowGroup to use the StringStore when the RowGroup contains long strings. 2. RowAggregation::initialize() allocates a memory buffer for a NULL row. The size of this buffer is equal to the row size for the output RowGroup. For the above scenario, using the default group_concat_max_len (which is a server variable that sets the maximum length of the GROUP_CONCAT string) value of 1mb, the buffer size would be (1mb + 64kb + some additional metadata). If the user sets group_concat_max_len to a higher value, say 3gb, this buffer size would be ~3gb. Now if the runtime initiates several instances of RowAggregation, total memory consumption by PrimProc could exceed the hardware memory limits causing the OS OOM to kill the process. We fix this problem by again enabling the StringStore for the NULL row allocation. 3. In the plugin code in buildAggregateColumn(), there is an integer overflow when the server group_concat_max_len variable (which is an uint32_t) is set to a value > INT32_MAX (such as 3gb) and is assigned to CalpontSystemCatalog::ColType::colWidth (which is an int32_t). As a short term fix, we saturate the assigned value to colWidth to INT32_MAX. Proper fix would be to upgrade CalpontSystemCatalog::ColType::colWidth to an uint32_t.	2023-04-22 00:43:29 +03:00
Leonid Fedorov	030144127e	Remove boost shared array [develop 23.02] (#2812 ) * remove boost/shared_array include * replace boost::shared_array<T> to std::shared_ptr<T[]>	2023-04-17 20:56:09 +03:00
Leonid Fedorov	f1697c261e	MCOL-5385 set data extermination [develop-23.02] (#2813 ) * Delete RowGroup::setData and make Pointer ctor explicit * some push_backs replaced with emplace_backs * Fixes of review notes	2023-04-16 15:57:39 +03:00
Leonid Fedorov	2f153184c3	Fixes of bugs from ASAN warnings, part one (#2796 )	2023-03-30 18:29:04 +03:00
Roman Nozdrin	a1d20d82d5	MCOL-5451 This resolves external GROUP BY result inconsistency issues (#2791 ) Given that idx is a RH hashmap bucket number and info is intra-bucket idx the root cause is triggered by the difference of idx/hash pair calculation for a certain GROUP BY generation and for generation aggregations merging that takes place in RowAggStorage::finalize. This patch generalizes rowHashToIdx to leverage it in both cases mentioned above.	2023-03-28 19:10:41 +03:00
Roman Nozdrin	7f3d540841	MCOL-5438 COUNT() in math causes SEGV (#2769 ) Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>	2023-03-10 19:32:17 +03:00
Leonid Fedorov	56f2346083	Remove windows ifdefs	2023-03-02 15:59:42 +00:00
Leonid Fedorov	123c345b40	remove winport	2023-03-02 15:37:11 +00:00
david.hall	8642231666	Changes to compile local	2022-11-17 11:29:21 -06:00
mariadb-AndreyPiskunov	b57d2c30fe	Minor fixes	2022-10-31 14:56:32 +02:00
mariadb-AndreyPiskunov	1714b75434	Non working attempt to do MCOL-5227	2022-10-31 14:56:32 +02:00
Alexey Antipovsky	440101dfff	[MCOL-5213] Fix a rare IO error	2022-09-14 17:12:15 +03:00
Roman Nozdrin	568ac5ba7b	Merge pull request #2535 from mariadb-corporation/int128Fields Int128 fields	2022-08-28 17:42:15 +01:00
Leonid Fedorov	d2432f9bf6	get rid of pointers for 128 fields	2022-08-26 15:12:22 +00:00
mariadb-AndreyPiskunov	0863ecd279	Replace getBinaryField	2022-08-25 18:21:43 +03:00
Roman Nozdrin	72e264e8ef	MCOL-5199 This patch solves the overal performance degradation introduced with a new way of char columns hashing in aggregation code The patch disables padding that forces hasher to calculate over the whole 2k buffer. This patch also moves hashing code into the common place where it belongs.	2022-08-24 19:07:06 +00:00
Roman Nozdrin	20f57b713a	MCOL-5198 This patch enables RowStorage to dump data on disk using startNewGeneration if there is 50 Megs left free	2022-08-24 14:00:43 +00:00
Alexey Antipovsky	15ce531270	Randomly start a new generation if the free memory is less than 30%	2022-08-24 14:00:37 +00:00
Alexey Antipovsky	dca359c2ab	Fix excessive memory consumption at the last stage of aggregation	2022-08-18 14:00:53 +03:00
Roman Nozdrin	dd96e686c0	MCOL-5153 This patch replaces MDB collation aware hash function with the (#2488 ) exact functionality that does not use MDB hash function. This patch also takes a bit from Robin Hood hash map implementation forgotten that reduces hash function collision rate.	2022-08-07 02:36:03 +03:00
Roman Nozdrin	6b17c358c0	MCOL-5153 This increases the size of the multiplier in the guarding check in RowAggStorage::increaseSize() so that it doesn't throw w/o a reason (#2463 )	2022-07-22 10:19:36 -05:00
David.Hall	272246e9fa	Merge branch 'develop' into MCOL-4841	2022-06-09 16:58:33 -05:00
david.hall	3b6449842f	Merge branch 'develop' into MCOL-4841 # Conflicts: # exemgr/main.cpp # oam/etc/Columnstore.xml.singleserver # primitives/primproc/primproc.cpp	2022-06-09 10:07:26 -05:00
Andrey Piskunov	c7e67aedd9	Renamed variables + removed server tests	2022-06-03 15:30:25 +03:00
Andrey Piskunov	c5fa27475d	Welford algorithm for STD and VAR Naive algorithm for calculating STD and VAR is subject to catastrophic cancellation. A well-known Welford's algorithms is used instead.	2022-06-03 15:29:30 +03:00
Leonid Fedorov	fbd043b036	Fixing alightment for clang tests of rowgroup	2022-03-23 14:29:19 +00:00
Leonid Fedorov	3919c541ac	New warnfixes (#2254 ) * Fix clang warnings * Remove vim tab guides * initialize variables * 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length * Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison * chars are unsigned on ARM, having if (ival < 0) always false * chars are unsigned by default on ARM and comparison with -1 if always true	2022-02-17 13:08:58 +03:00
Gagan Goel	973e5024d8	MCOL-4957 Fix performance slowdown for processing TIMESTAMP columns. Part 1: As part of MCOL-3776 to address synchronization issue while accessing the fTimeZone member of the Func class, mutex locks were added to the accessor and mutator methods. However, this slows down processing of TIMESTAMP columns in PrimProc significantly as all threads across all concurrently running queries would serialize on the mutex. This is because PrimProc only has a single global object for the functor class (class derived from Func in utils/funcexp/functor.h) for a given function name. To fix this problem: (1) We remove the fTimeZone as a member of the Func derived classes (hence removing the mutexes) and instead use the fOperationType member of the FunctionColumn class to propagate the timezone values down to the individual functor processing functions such as FunctionColumn::getStrVal(), FunctionColumn::getIntVal(), etc. (2) To achieve (1), a timezone member is added to the execplan::CalpontSystemCatalog::ColType class. Part 2: Several functors in the Funcexp code call dataconvert::gmtSecToMySQLTime() and dataconvert::mySQLTimeToGmtSec() functions for conversion between seconds since unix epoch and broken-down representation. These functions in turn call the C library function localtime_r() which currently has a known bug of holding a global lock via a call to __tz_convert. This significantly reduces performance in multi-threaded applications where multiple threads concurrently call localtime_r(). More details on the bug: https://sourceware.org/bugzilla/show_bug.cgi?id=16145 This bug in localtime_r() caused processing of the Functors in PrimProc to slowdown significantly since a query execution causes Functors code to be processed in a multi-threaded manner. As a fix, we remove the calls to localtime_r() from gmtSecToMySQLTime() and mySQLTimeToGmtSec() by performing the timezone-to-offset conversion (done in dataconvert::timeZoneToOffset()) during the execution plan creation in the plugin. Note that localtime_r() is only called when the time_zone system variable is set to "SYSTEM". This fix also required changing the timezone type from a std::string to a long across the system.	2022-02-14 14:12:27 -05:00
David Hall	27dea733c5	MCOL4841 dev port run large join without OOM	2022-02-09 17:33:55 -06:00
Leonid Fedorov	04752ec546	clang format apply	2022-01-21 16:43:49 +00:00
Leonid Fedorov	01f3ceb437	replace header guards with #pragma once	2022-01-21 15:24:58 +00:00
Denis Khalikov	6393c6d019	MCOL-4810 Add support for missed operation for `longStrings`.	2021-10-28 10:02:02 +03:00
Roman Nozdrin	3de038c1da	MCOL-4876 This patch enables continues buffer to be used by ColumnCommand and aligns BPP::blockData that in most cases was unaligned	2021-10-06 09:23:40 +00:00
Alexey Antipovsky	6a4140394d	[MCOL-4829] More accurate memory counting	2021-09-07 19:52:20 +03:00
Alexey Antipovsky	7fea3c988e	[MCOL-4829] Compression for the temp disk-based aggregation files	2021-09-02 19:30:25 +03:00
Roman Nozdrin	46cf13ffa8	Merge pull request #2101 from denis0x0D/MCOL-4810_2 MCOL-4810 Redundant copying and wasting memory in PrimProc	2021-08-27 14:05:51 +03:00
Denis Khalikov	7bda598fbf	MCOL-4810 Redundant copying and wasting memory in PrimProc This patch eliminates a copying `long string`s into the bytestream.	2021-08-26 12:16:23 +03:00
Leonid Fedorov	5c5f103f98	MCOL-4839: Fix clang build (#2100 ) * Fix clang build * Extern C returned to plugin_instance Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>	2021-08-23 10:45:10 -05:00
Leonid Fedorov	73e710ed52	Add ctest for google unittests	2021-08-02 19:41:04 +03:00
Gagan Goel	3d557a2f1e	Merge pull request #2044 from dhall-MariaDB/MCOL-3738 MCOL-3738 COUNT(DISTINCT) with multiple parms	2021-07-12 07:34:56 -04:00
Leonid Fedorov	51a8ffcb6a	Fix sumavgoverflow.sql test	2021-07-09 22:41:28 +00:00
David Hall	76607be63a	MCOL-3738 COUNT(DISTINCT) with multiple parms Fixed regression Added a few more mtr tests	2021-07-09 09:07:03 -05:00
Leonid Fedorov	f81f743282	Replace underlying type for avg and sum for int types from long double to wide decimal	2021-07-08 17:04:43 +00:00
David Hall	1113470551	MCOL-4738 AVG gives wrong results with strict_aliasing A f fix that works with strict_aliasing	2021-07-07 13:08:32 -05:00
Alexander Barkov	8988253ff4	Merge pull request #2031 from mariadb-corporation/bar-develop-MCOL-4801 MCOL-4801 Replace Row methods getStringLength() and getStringPointer(…	2021-07-07 13:53:19 +04:00
David Hall	8332ab8974	MCOL-4738 AVG() returns a wrong result On AMD64 machines, the fpu is 80 bits. The unused bits must be masked for memcmp to work properly. For other archetectures, we don't want to mask those bits.	2021-07-06 19:50:00 -05:00

1 2 3 4 5

226 Commits