mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-04 04:42:30 +03:00

Author	SHA1	Message	Date
Gagan Goel	55d4214429	MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. (#2823 ) 1. Input and output RowGroup's used in GROUP_CONCAT classes are currently allocating a raw memory buffer of size equal to the actual width of the string datatype. As an example, for the following query: SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1; If col2 is a TEXT field with default width, the input RowGroup containing the target rows to be concatenated will assign 64kb of memory for every input row in the RowGroup. This is wasteful as actual field values in real workloads would be much smaller. We fix this by enabling the RowGroup to use the StringStore when the RowGroup contains long strings. 2. RowAggregation::initialize() allocates a memory buffer for a NULL row. The size of this buffer is equal to the row size for the output RowGroup. For the above scenario, using the default group_concat_max_len (which is a server variable that sets the maximum length of the GROUP_CONCAT string) value of 1mb, the buffer size would be (1mb + 64kb + some additional metadata). If the user sets group_concat_max_len to a higher value, say 3gb, this buffer size would be ~3gb. Now if the runtime initiates several instances of RowAggregation, total memory consumption by PrimProc could exceed the hardware memory limits causing the OS OOM to kill the process. We fix this problem by again enabling the StringStore for the NULL row allocation. 3. In the plugin code in buildAggregateColumn(), there is an integer overflow when the server group_concat_max_len variable (which is an uint32_t) is set to a value > INT32_MAX (such as 3gb) and is assigned to CalpontSystemCatalog::ColType::colWidth (which is an int32_t). As a short term fix, we saturate the assigned value to colWidth to INT32_MAX. Proper fix would be to upgrade CalpontSystemCatalog::ColType::colWidth to an uint32_t.	2023-04-22 00:43:29 +03:00
Leonid Fedorov	030144127e	Remove boost shared array [develop 23.02] (#2812 ) * remove boost/shared_array include * replace boost::shared_array<T> to std::shared_ptr<T[]>	2023-04-17 20:56:09 +03:00
Leonid Fedorov	f1697c261e	MCOL-5385 set data extermination [develop-23.02] (#2813 ) * Delete RowGroup::setData and make Pointer ctor explicit * some push_backs replaced with emplace_backs * Fixes of review notes	2023-04-16 15:57:39 +03:00
Leonid Fedorov	56f2346083	Remove windows ifdefs	2023-03-02 15:59:42 +00:00
mariadb-AndreyPiskunov	b57d2c30fe	Minor fixes	2022-10-31 14:56:32 +02:00
mariadb-AndreyPiskunov	1714b75434	Non working attempt to do MCOL-5227	2022-10-31 14:56:32 +02:00
Leonid Fedorov	d2432f9bf6	get rid of pointers for 128 fields	2022-08-26 15:12:22 +00:00
mariadb-AndreyPiskunov	0863ecd279	Replace getBinaryField	2022-08-25 18:21:43 +03:00
David.Hall	272246e9fa	Merge branch 'develop' into MCOL-4841	2022-06-09 16:58:33 -05:00
david.hall	3b6449842f	Merge branch 'develop' into MCOL-4841 # Conflicts: # exemgr/main.cpp # oam/etc/Columnstore.xml.singleserver # primitives/primproc/primproc.cpp	2022-06-09 10:07:26 -05:00
Andrey Piskunov	c7e67aedd9	Renamed variables + removed server tests	2022-06-03 15:30:25 +03:00
Andrey Piskunov	c5fa27475d	Welford algorithm for STD and VAR Naive algorithm for calculating STD and VAR is subject to catastrophic cancellation. A well-known Welford's algorithms is used instead.	2022-06-03 15:29:30 +03:00
Gagan Goel	973e5024d8	MCOL-4957 Fix performance slowdown for processing TIMESTAMP columns. Part 1: As part of MCOL-3776 to address synchronization issue while accessing the fTimeZone member of the Func class, mutex locks were added to the accessor and mutator methods. However, this slows down processing of TIMESTAMP columns in PrimProc significantly as all threads across all concurrently running queries would serialize on the mutex. This is because PrimProc only has a single global object for the functor class (class derived from Func in utils/funcexp/functor.h) for a given function name. To fix this problem: (1) We remove the fTimeZone as a member of the Func derived classes (hence removing the mutexes) and instead use the fOperationType member of the FunctionColumn class to propagate the timezone values down to the individual functor processing functions such as FunctionColumn::getStrVal(), FunctionColumn::getIntVal(), etc. (2) To achieve (1), a timezone member is added to the execplan::CalpontSystemCatalog::ColType class. Part 2: Several functors in the Funcexp code call dataconvert::gmtSecToMySQLTime() and dataconvert::mySQLTimeToGmtSec() functions for conversion between seconds since unix epoch and broken-down representation. These functions in turn call the C library function localtime_r() which currently has a known bug of holding a global lock via a call to __tz_convert. This significantly reduces performance in multi-threaded applications where multiple threads concurrently call localtime_r(). More details on the bug: https://sourceware.org/bugzilla/show_bug.cgi?id=16145 This bug in localtime_r() caused processing of the Functors in PrimProc to slowdown significantly since a query execution causes Functors code to be processed in a multi-threaded manner. As a fix, we remove the calls to localtime_r() from gmtSecToMySQLTime() and mySQLTimeToGmtSec() by performing the timezone-to-offset conversion (done in dataconvert::timeZoneToOffset()) during the execution plan creation in the plugin. Note that localtime_r() is only called when the time_zone system variable is set to "SYSTEM". This fix also required changing the timezone type from a std::string to a long across the system.	2022-02-14 14:12:27 -05:00
David Hall	27dea733c5	MCOL4841 dev port run large join without OOM	2022-02-09 17:33:55 -06:00
Leonid Fedorov	04752ec546	clang format apply	2022-01-21 16:43:49 +00:00
Alexey Antipovsky	6a4140394d	[MCOL-4829] More accurate memory counting	2021-09-07 19:52:20 +03:00
Alexey Antipovsky	7fea3c988e	[MCOL-4829] Compression for the temp disk-based aggregation files	2021-09-02 19:30:25 +03:00
Leonid Fedorov	51a8ffcb6a	Fix sumavgoverflow.sql test	2021-07-09 22:41:28 +00:00
Leonid Fedorov	f81f743282	Replace underlying type for avg and sum for int types from long double to wide decimal	2021-07-08 17:04:43 +00:00
Alexander Barkov	9794f24369	MCOL-4801 Replace Row methods getStringLength() and getStringPointer() to getConstString()	2021-07-06 21:15:32 +04:00
Gagan Goel	8520f87237	MCOL-641 Cleanup.	2021-07-06 09:01:49 +00:00
Roman Nozdrin	bed0b7c6bc	MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI, functional JOINs based on top of TypelessData	2021-06-24 08:07:23 +00:00
Alexey Antipovsky	0dedb7e628	Fix compilation warnings	2021-06-09 16:51:00 +03:00
Alexey Antipovsky	475104e4d3	[MCOL-4709] Disk-based aggregation * Introduce multigeneration aggregation * Do not save unused part of RGDatas to disk * Add IO error explanation (strerror) * Reduce memory usage while aggregating * introduce in-memory generations to better memory utilization * Try to limit the qty of buckets at a low limit * Refactor disk aggregation a bit * pass calculated hash into RowAggregation * try to keep some RGData with free space in memory * do not dump more than half of rowgroups to disk if generations are allowed, instead start a new generation * for each thread shift the first processed bucket at each iteration, so the generations start more evenly * Unify temp data location * Explicitly create temp subdirectories whether disk aggregation/join are enabled or not	2021-06-06 16:09:15 +03:00
Alexander Barkov	9608533d92	MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop mcsconfig.h and my_config.h have the following pre-processor definitions: 1. Conflicting definitions coming from the standard cmake definitions: - PACKAGE - PACKAGE_BUGREPORT - PACKAGE_NAME - PACKAGE_STRING - PACKAGE_TARNAME - PACKAGE_VERSION - VERSION 2. Conflicting definitions of other kinds: - HAVE_STRTOLL - this is a dirt in MariaDB headers. Should be fixed in the server code. my_config.h erroneously performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1". in some cases. The former is not CMake compatible style. The latter is. 3. Non-conflicting definitions: Otherwise, mcsconfig.h and my_config.h should be mutually compatible, because both are generated by cmake on the same host machine. So they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc. Observations: - It's OK to include both mcsconfig.h and my_config.h providing that we suppress duplicate definition of the above conflicting types #1 and #2. - There is no a need to suppress duplicate definitions mentioned in #3, as they are compatible! - my_sys.h and m_ctype.h must always follow a CMake configuation header, either my_config.h or mcsconfig.h (or both). They must never be included without any preceeding configuration header. This change make sure that we resolve conflicts by: - either disallowing inclusion of mcsconfig.h and my_config.h at the same time - or by hiding conflicting definitions #1 and #2 (with their later restoring). - also, by making sure that my_sys.h and m_ctype.h always follow a CMake configuration file. Details: - idb_mysql.h can now only be included only after my_config.h An attempt to use idb_mysql.h with mcsconfig.h instead of my_config.h is caught by the "#error" preprocessor directive. - mariadb_my_sys.h can now be only included after mcsconfig.h. An attempt to use mariadb_my_sys.h without mcscofig.h (e.g. with my_config.h) is also caught by "#error". - collation.h now can now be included in two ways. It now has the following effective structure: #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H) // Remember current conflicting definitions on the preprocessor stack // Undefine current conflicting definitions #endif #include "mcsconfig.h" #include "m_ctype.h" #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H) # Restore conflicting definitions from the preprocessor stack #endif and can be included as follows: a. using only mcsconfig.h as a configuration header: // my_config.h must not be included so far #include "collation.h" b. using my_config.h as the first included configuration file: #define PREFER_MY_CONFIG_H // Force conflict resolution #include "my_config.h" // can be included directly or indirectly ... #include "collation.h" Other changes: - Adding helper header files utils/common/mcsconfig_conflicting_defs_remember.h utils/common/mcsconfig_conflicting_defs_restore.h utils/common/mcsconfig_conflicting_defs_undef.h to perform conflict resolution easier. - Removing `#include "collation.h"` from a number of files, as it's automatically included from rowgroup.h. - Removing redundant `#include "utils_utf8.h"`. This change is not directly related to the problem being fixed, but it's nice to remove redundant directives for both collation.h and utils_utf8.h from all the files that do not really need them. (this change could probably have gone as a separate commit) - Changing my_init() to MY_INIT(argv[0]) in the MCS services sources. After the fix of the complitation failure it appeared that ColumnStore services compiled with the debug build crash due to recent changes in safemalloc. The crash happened in strcmp() with `my_progname` as an argument (where my_progname is a mysys global variable). This problem should probably be fixed on the server side as well to avoid passing NULL. But, the majority of MariaDB executable programs also use MY_INIT(argv[0]) rather than my_init(). So let's make MCS do like the other programs do.	2021-05-25 12:34:36 +04:00
David Hall	f4e6939139	MCOL-4643 dev 5 reset valOut after processing UDAF After a UDAF result has been inserted in the output stream, the valOut object needs to be reset to empty in preparation for the next value. Failing to do so may cause what should be a NULL value to erroneously take the last value inserted.	2021-04-30 10:57:40 -05:00
Alexander Barkov	362bfcd15e	MCOL-4361 Replace pow(10.0, (double)scale) expressions with a static dictionary lookup.	2021-04-09 12:41:04 +04:00
David Hall	0eee6cfc62	MCOL-4643 reset valOut after UDAF evaluation	2021-03-26 16:09:15 -05:00
Roman Nozdrin	5b9689ce55	MCOL-4478 MCS now rounds the last digits of an avg() result for wide-DECIMAL argument	2020-12-30 15:02:12 +00:00
Roman Nozdrin	5815c5c526	MCOL-4452 RowAggregationUMP2::doUDAF() now calls setUserData() using a correct UDAF context	2020-12-22 15:43:51 +00:00
Roman Nozdrin	178be69bc4	MCOL-4394 __float128 related code had been moved into a separate file Trim to double and to long double conversions for Decimal	2020-11-19 12:08:18 +00:00
Roman Nozdrin	c00daa93bd	MCOL-4172 MultiDistinctRowAggregation didn't honor multiple UDAF in projection ::doUDAF() doesn't crash anymore trying to access fRGContextColl[] elements that doesn't exist running RowAggregationMultiDistinct::doAggregate()	2020-11-18 13:53:15 +00:00
Roman Nozdrin	3eb26c0d4a	MCOL-4313 Introduced TSInt128 that is a storage class for int128 Removed uint128 from joblist/lbidlist.* Another toString() method for wide-decimal that is EMPTY/NULL aware Unified decimal processing in WF functions Fixed a potential issue in EqualCompData::operator() for wide-decimal processing Fixed some signedness warnings	2020-11-18 13:53:15 +00:00
Alexander Barkov	d5c6645ba1	Adding mcs_basic_types.h For now it consists of only: using int128_t = __int128; using uint128_t = unsigned __int128; All new privitive data types should go into this file in the future.	2020-11-18 13:53:15 +00:00
Alexander Barkov	129d5b5a0f	MCOL-4174 Review/refactor frontend/connector code	2020-11-18 13:53:15 +00:00
Roman Nozdrin	8de9764f84	MCOL-4172 Add support for wide-DECIMAL into statistical aggregate and regr_* UDAF functions The patch fixes wrong results returned when multiple UDAF exist in projection aggregate over wide decimal literals now works	2020-11-18 13:52:20 +00:00
David Hall	638202417f	MCOL-4171	2020-11-18 13:52:19 +00:00
Roman Nozdrin	bd0d5af123	Merge fixes.	2020-11-18 13:51:26 +00:00
Roman Nozdrin	eeebe83839	MCOL-641 Fixed the incorrect if-condition.	2020-11-18 13:51:26 +00:00
Roman Nozdrin	21a41738e1	MCOL-641 Simple aggregates works with GROUP BY column keys. Fixed constant colump copy for binary columns in TNS.	2020-11-18 13:51:26 +00:00
Roman Nozdrin	e88cbe9bc1	MCOL-641 Simple aggregates support: min, max, sum, avg for wide-DECIMALs.	2020-11-18 13:51:25 +00:00
Roman Nozdrin	97ee1609b2	MCOL-641 Replaced NULL binary constants. DataConvert::decimalToString, toString, writeIntPart, writeFractionalPart are not templates anymore.	2020-11-18 13:47:44 +00:00
Roman Nozdrin	de85e21c38	MCOL-641 This commit cleans up Row methods and adds couple UT for Row.	2020-11-18 13:47:02 +00:00
Roman Nozdrin	f73de30427	MCOL-641 This commit introduces GTest Suite into CS. Binary NULL magic now consists of a series of BINARYEMPTYROW-s + BINARYNULL in the end. ByteStream now has hexbyte alias. Added ColumnCommand::getEmptyRowValue to support 16 byte EMPTY values.	2020-11-18 13:47:01 +00:00
drrtuy	84f9821720	MCOL-641 Switched to DataConvert static methods in joblist code. Replaced BINARYEMPTYROW and BINARYNULL values. We need to have separate magic values for numeric and non-numeric binary types b/c numeric cant tolerate losing 0 used for magics previously. atoi128() now parses minus sign and produces negative values. RowAggregation::isNull() now uses Row::isNull() for DECIMAL.	2020-11-18 13:47:01 +00:00
drrtuy	0ff0472842	MCOL-641 sum() now works with DECIMAL(38) columns. TupleAggregateStep class method and buildAggregateColumn() now properly set result data type. doSum() now handles DECIMAL(38) in approprate manner. Low-level null related methods for new binary-based datatypes now handles magic values for binary-based DT.	2020-11-18 13:47:01 +00:00
Alexey Antipovsky	b25fee320a	Remove variable-length arrays (-Wvla)	2020-11-17 15:03:10 +03:00
Gagan Goel	2ba9263df4	Silence -Werror=implicit-fallthrough compiler errors - Patch from Monty. The patch also fixes some potential bugs due to missing break statements.	2020-06-26 12:32:57 -04:00
David Hall	f9078efbc6	MCOL-3536 Collation	2020-06-08 17:57:37 -05:00
David Hall	78ac310e42	MCOL-3536 Collation	2020-06-01 15:08:15 -05:00

1 2

88 Commits