mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-10-17 01:27:36 +03:00

Author	SHA1	Message	Date
Leonid Fedorov	4d7a6a0be5	perf(primproc) MCOL-5601: Initilize two fields once in ctor instead of calling makeConfig std::string fTmpDir = config::Config::makeConfig()->getTempFileDir(config::Config::TempDirPurpose::Aggregates); std::string fCompStr = config::Config::makeConfig()->getConfig("RowAggregation", "Compression");	2023-12-19 15:25:19 +03:00
Sergey Zefirov	8632c85ecf	feat(primproc,aggregegation)!: Changes for ROLLUP with single-phase aggregation (#3025 ) The fix is simple: enable subtotals in single-phase aggregation and disable parallel processing when there are subtotals and aggregation is single-phase.	2023-11-28 17:33:02 +03:00
Roman Nozdrin	dfc9e89496	fix(rowstorage): SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock	2023-11-01 18:19:45 +00:00
Roman Nozdrin	eb744eafed	chore(datatypes): this refactors the placement of the main SQL data types enum to enable templates that are parametrized with this enum(see mcs_datatype_basic.h changes for more details).	2023-10-24 18:44:35 +03:00
Leonid Fedorov	86c1c5d537	fix(rgdata)!: Fix assertion failure leading to disk-based aggregation failure The new added invariant checking that RGData knows the number of columns and fixed size columns was failing for disk-based aggregation workloads, leading them to provide a wrong result. (The assertion failure happened in RGData::getRow(uint32_t num, Row* row) which is called in the finalization of sub-aggregation results, necessary for merging part results. As the merging failed, duplicate results were output for disk-based aggregation queries. The assertion failure was caused by RGData::deserialize(ByteStream& bs, uint32_t defAmount) not setting rowSize and colCount if necessary (e.g. when the deserialization happens into a new, default RGData, which doesn't know anything about its structure yet. This is the case when the default constructor for RGData() is used, which sets rowSize and columnCount to 0 each. There are three code parts that make use of the default RGData() ctor. The fix is for the use in RowGroupStorage::loadRG(uint64_t rgid, std::unique_ptr<RGData>& rgdata, bool unlinkDump = false), where the default RGData object is used to directly deserialize a ByteStream into it. The deserialize method now checks if both rowSize and columnCount are 0 and if yes sets the read values from the ByteStream for both. We should probably check the other two code parts making use of the default RGData ctor, too. This happens in joinpartition.cpp and tuplejoiner.cpp. --------- Co-authored-by: Theresa Hradilak <34538290+phoeinx@users.noreply.github.com>	2023-09-30 00:02:31 +03:00
Sergey Zefirov	920607520c	feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support Adds a special column which helps to differentiate data and rollups of various depts and a simple logic to row aggregation to add processing of subtotals.	2023-09-26 17:01:53 +03:00
Leonid Fedorov	8f93fc3623	MCOL-5493: First portion of UBSan fixes (#2842 ) Multiple UB fixes	2023-06-02 17:02:09 +03:00
Gagan Goel	0be1c3dc8f	MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. 1. Input and output RowGroup's used in GROUP_CONCAT classes are currently allocating a raw memory buffer of size equal to the actual width of the string datatype. As an example, for the following query: SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1; If col2 is a TEXT field with default width, the input RowGroup containing the target rows to be concatenated will assign 64kb of memory for every input row in the RowGroup. This is wasteful as actual field values in real workloads would be much smaller. We fix this by enabling the RowGroup to use the StringStore when the RowGroup contains long strings. 2. RowAggregation::initialize() allocates a memory buffer for a NULL row. The size of this buffer is equal to the row size for the output RowGroup. For the above scenario, using the default group_concat_max_len (which is a server variable that sets the maximum length of the GROUP_CONCAT string) value of 1mb, the buffer size would be (1mb + 64kb + some additional metadata). If the user sets group_concat_max_len to a higher value, say 3gb, this buffer size would be ~3gb. Now if the runtime initiates several instances of RowAggregation, total memory consumption by PrimProc could exceed the hardware memory limits causing the OS OOM to kill the process. We fix this problem by again enabling the StringStore for the NULL row allocation. 3. In the plugin code in buildAggregateColumn(), there is an integer overflow when the server group_concat_max_len variable (which is an uint32_t) is set to a value > INT32_MAX (such as 3gb) and is assigned to CalpontSystemCatalog::ColType::colWidth (which is an int32_t). As a short term fix, we saturate the assigned value to colWidth to INT32_MAX. Proper fix would be to upgrade CalpontSystemCatalog::ColType::colWidth to an uint32_t.	2023-05-01 13:06:23 -04:00
Roman Nozdrin	4fe9cd64a3	Revert "No boost condition (#2822 )" (#2828 ) This reverts commit `f916e64927`.	2023-04-22 15:49:50 +03:00
Leonid Fedorov	f916e64927	No boost condition (#2822 ) This patch replaces boost primitives with stdlib counterparts.	2023-04-22 00:42:45 +03:00
Leonid Fedorov	c2d0fa24da	replace boost::shared_array<T> to std::shared_ptr<T[]>	2023-04-14 10:33:27 +00:00
Leonid Fedorov	a508b86091	remove boost/shared_array include	2023-04-14 09:42:50 +00:00
Leonid Fedorov	6c32c658d5	MCOL-5385: Delete RowGroup::setData and make Pointer ctor explicit (#2808 ) * Delete RowGroup::setData and make Pointer ctor explicit * some push_backs replaced with emplace_backs * Fixes of review notes	2023-04-13 03:55:30 +03:00
Leonid Fedorov	2e1394149b	MCOL-5464: Fixes of bugs from ASAN warnings, part one (#2792 ) * Fixes of bugs from ASAN warnings, part one * MQC as static library, with nifty counter for global map and mutex * Switch clang to 16 * link messageqcpp to execplan	2023-04-04 02:33:23 +03:00
Sergey Zefirov	b53c231ca6	MCOL-271 empty strings should not be NULLs (#2794 ) This patch improves handling of NULLs in textual fields in ColumnStore. Previously empty strings were considered NULLs and it could be a problem if data scheme allows for empty strings. It was also one of major reasons of behavior difference between ColumnStore and other engines in MariaDB family. Also, this patch fixes some other bugs and incorrect behavior, for example, incorrect comparison for "column <= ''" which evaluates to constant True for all purposes before this patch.	2023-03-30 21:18:29 +03:00
Roman Nozdrin	688b47d4e7	MCOL-5451 This resolves external GROUP BY result inconsistency issues Given that idx is a RH hashmap bucket number and info is intra-bucket idx the root cause is triggered by the difference of idx/hash pair calculation for a certain GROUP BY generation and for generation aggregations merging that takes place in RowAggStorage::finalize. This patch generalizes rowHashToIdx to leverage it in both cases mentioned above.	2023-03-25 15:04:16 +00:00
Roman Nozdrin	786b9da5b0	MCOL-5438 COUNT() in math causes SEGV	2023-03-09 20:35:38 +00:00
Leonid Fedorov	56f2346083	Remove windows ifdefs	2023-03-02 15:59:42 +00:00
Leonid Fedorov	123c345b40	remove winport	2023-03-02 15:37:11 +00:00
david.hall	8642231666	Changes to compile local	2022-11-17 11:29:21 -06:00
mariadb-AndreyPiskunov	b57d2c30fe	Minor fixes	2022-10-31 14:56:32 +02:00
mariadb-AndreyPiskunov	1714b75434	Non working attempt to do MCOL-5227	2022-10-31 14:56:32 +02:00
Alexey Antipovsky	440101dfff	[MCOL-5213] Fix a rare IO error	2022-09-14 17:12:15 +03:00
Roman Nozdrin	568ac5ba7b	Merge pull request #2535 from mariadb-corporation/int128Fields Int128 fields	2022-08-28 17:42:15 +01:00
Leonid Fedorov	d2432f9bf6	get rid of pointers for 128 fields	2022-08-26 15:12:22 +00:00
mariadb-AndreyPiskunov	0863ecd279	Replace getBinaryField	2022-08-25 18:21:43 +03:00
Roman Nozdrin	72e264e8ef	MCOL-5199 This patch solves the overal performance degradation introduced with a new way of char columns hashing in aggregation code The patch disables padding that forces hasher to calculate over the whole 2k buffer. This patch also moves hashing code into the common place where it belongs.	2022-08-24 19:07:06 +00:00
Roman Nozdrin	20f57b713a	MCOL-5198 This patch enables RowStorage to dump data on disk using startNewGeneration if there is 50 Megs left free	2022-08-24 14:00:43 +00:00
Alexey Antipovsky	15ce531270	Randomly start a new generation if the free memory is less than 30%	2022-08-24 14:00:37 +00:00
Alexey Antipovsky	dca359c2ab	Fix excessive memory consumption at the last stage of aggregation	2022-08-18 14:00:53 +03:00
Roman Nozdrin	dd96e686c0	MCOL-5153 This patch replaces MDB collation aware hash function with the (#2488 ) exact functionality that does not use MDB hash function. This patch also takes a bit from Robin Hood hash map implementation forgotten that reduces hash function collision rate.	2022-08-07 02:36:03 +03:00
Roman Nozdrin	6b17c358c0	MCOL-5153 This increases the size of the multiplier in the guarding check in RowAggStorage::increaseSize() so that it doesn't throw w/o a reason (#2463 )	2022-07-22 10:19:36 -05:00
David.Hall	272246e9fa	Merge branch 'develop' into MCOL-4841	2022-06-09 16:58:33 -05:00
david.hall	3b6449842f	Merge branch 'develop' into MCOL-4841 # Conflicts: # exemgr/main.cpp # oam/etc/Columnstore.xml.singleserver # primitives/primproc/primproc.cpp	2022-06-09 10:07:26 -05:00
Andrey Piskunov	c7e67aedd9	Renamed variables + removed server tests	2022-06-03 15:30:25 +03:00
Andrey Piskunov	c5fa27475d	Welford algorithm for STD and VAR Naive algorithm for calculating STD and VAR is subject to catastrophic cancellation. A well-known Welford's algorithms is used instead.	2022-06-03 15:29:30 +03:00
Leonid Fedorov	fbd043b036	Fixing alightment for clang tests of rowgroup	2022-03-23 14:29:19 +00:00
Leonid Fedorov	3919c541ac	New warnfixes (#2254 ) * Fix clang warnings * Remove vim tab guides * initialize variables * 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length * Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison * chars are unsigned on ARM, having if (ival < 0) always false * chars are unsigned by default on ARM and comparison with -1 if always true	2022-02-17 13:08:58 +03:00
Gagan Goel	973e5024d8	MCOL-4957 Fix performance slowdown for processing TIMESTAMP columns. Part 1: As part of MCOL-3776 to address synchronization issue while accessing the fTimeZone member of the Func class, mutex locks were added to the accessor and mutator methods. However, this slows down processing of TIMESTAMP columns in PrimProc significantly as all threads across all concurrently running queries would serialize on the mutex. This is because PrimProc only has a single global object for the functor class (class derived from Func in utils/funcexp/functor.h) for a given function name. To fix this problem: (1) We remove the fTimeZone as a member of the Func derived classes (hence removing the mutexes) and instead use the fOperationType member of the FunctionColumn class to propagate the timezone values down to the individual functor processing functions such as FunctionColumn::getStrVal(), FunctionColumn::getIntVal(), etc. (2) To achieve (1), a timezone member is added to the execplan::CalpontSystemCatalog::ColType class. Part 2: Several functors in the Funcexp code call dataconvert::gmtSecToMySQLTime() and dataconvert::mySQLTimeToGmtSec() functions for conversion between seconds since unix epoch and broken-down representation. These functions in turn call the C library function localtime_r() which currently has a known bug of holding a global lock via a call to __tz_convert. This significantly reduces performance in multi-threaded applications where multiple threads concurrently call localtime_r(). More details on the bug: https://sourceware.org/bugzilla/show_bug.cgi?id=16145 This bug in localtime_r() caused processing of the Functors in PrimProc to slowdown significantly since a query execution causes Functors code to be processed in a multi-threaded manner. As a fix, we remove the calls to localtime_r() from gmtSecToMySQLTime() and mySQLTimeToGmtSec() by performing the timezone-to-offset conversion (done in dataconvert::timeZoneToOffset()) during the execution plan creation in the plugin. Note that localtime_r() is only called when the time_zone system variable is set to "SYSTEM". This fix also required changing the timezone type from a std::string to a long across the system.	2022-02-14 14:12:27 -05:00
David Hall	27dea733c5	MCOL4841 dev port run large join without OOM	2022-02-09 17:33:55 -06:00
Leonid Fedorov	04752ec546	clang format apply	2022-01-21 16:43:49 +00:00
Leonid Fedorov	01f3ceb437	replace header guards with #pragma once	2022-01-21 15:24:58 +00:00
Denis Khalikov	6393c6d019	MCOL-4810 Add support for missed operation for `longStrings`.	2021-10-28 10:02:02 +03:00
Roman Nozdrin	3de038c1da	MCOL-4876 This patch enables continues buffer to be used by ColumnCommand and aligns BPP::blockData that in most cases was unaligned	2021-10-06 09:23:40 +00:00
Alexey Antipovsky	6a4140394d	[MCOL-4829] More accurate memory counting	2021-09-07 19:52:20 +03:00
Alexey Antipovsky	7fea3c988e	[MCOL-4829] Compression for the temp disk-based aggregation files	2021-09-02 19:30:25 +03:00
Roman Nozdrin	46cf13ffa8	Merge pull request #2101 from denis0x0D/MCOL-4810_2 MCOL-4810 Redundant copying and wasting memory in PrimProc	2021-08-27 14:05:51 +03:00
Denis Khalikov	7bda598fbf	MCOL-4810 Redundant copying and wasting memory in PrimProc This patch eliminates a copying `long string`s into the bytestream.	2021-08-26 12:16:23 +03:00
Leonid Fedorov	5c5f103f98	MCOL-4839: Fix clang build (#2100 ) * Fix clang build * Extern C returned to plugin_instance Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>	2021-08-23 10:45:10 -05:00
Leonid Fedorov	73e710ed52	Add ctest for google unittests	2021-08-02 19:41:04 +03:00

1 2 3 4 5

233 Commits