1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-01 06:21:41 +03:00
Commit Graph

226 Commits

Author SHA1 Message Date
1c297b9e9e feat(): dangling pointer/ref issue has been solved for both RGData and BS 2024-12-13 15:56:28 +00:00
937d09768b feat(): propagate long strings SP type change 2024-12-04 23:45:05 +00:00
789a382be2 feat(): use boost::make_shared b/c most distros can't do allocate_shared for array types. 2024-12-03 22:40:53 +00:00
5383e7c5a2 feat(RGData,StringStore): add counting allocator capabilities to those ctors used in BPP::execute() 2024-11-30 18:51:29 +00:00
55d4214429 MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. (#2823)
1. Input and output RowGroup's used in GROUP_CONCAT classes
are currently allocating a raw memory buffer of size equal
to the actual width of the string datatype. As an example,
for the following query:
  SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1;
If col2 is a TEXT field with default width, the input
RowGroup containing the target rows to be concatenated will
assign 64kb of memory for every input row in the RowGroup.
This is wasteful as actual field values in real workloads
would be much smaller. We fix this by enabling the
RowGroup to use the StringStore when the RowGroup contains
long strings.

2. RowAggregation::initialize() allocates a memory buffer
for a NULL row. The size of this buffer is equal to the
row size for the output RowGroup. For the above scenario,
using the default group_concat_max_len (which is a server
variable that sets the maximum length of the GROUP_CONCAT string)
value of 1mb, the buffer size would be
(1mb + 64kb + some additional metadata). If the user sets
group_concat_max_len to a higher value, say 3gb, this buffer
size would be ~3gb. Now if the runtime initiates several
instances of RowAggregation, total memory consumption by
PrimProc could exceed the hardware memory limits causing the
OS OOM to kill the process. We fix this problem by again
enabling the StringStore for the NULL row allocation.

3. In the plugin code in buildAggregateColumn(), there is
an integer overflow when the server group_concat_max_len
variable (which is an uint32_t) is set to a value > INT32_MAX
(such as 3gb) and is assigned to
CalpontSystemCatalog::ColType::colWidth (which is an int32_t).
As a short term fix, we saturate the assigned value to colWidth
to INT32_MAX. Proper fix would be to upgrade
CalpontSystemCatalog::ColType::colWidth to an uint32_t.
2023-04-22 00:43:29 +03:00
030144127e Remove boost shared array [develop 23.02] (#2812)
* remove boost/shared_array include

* replace boost::shared_array<T> to std::shared_ptr<T[]>
2023-04-17 20:56:09 +03:00
f1697c261e MCOL-5385 set data extermination [develop-23.02] (#2813)
* Delete RowGroup::setData and make Pointer ctor explicit

* some push_backs replaced with emplace_backs

* Fixes of review notes
2023-04-16 15:57:39 +03:00
2f153184c3 Fixes of bugs from ASAN warnings, part one (#2796) 2023-03-30 18:29:04 +03:00
a1d20d82d5 MCOL-5451 This resolves external GROUP BY result inconsistency issues (#2791)
Given that idx is a RH hashmap bucket number and info is intra-bucket idx
    the root cause is triggered by the difference of idx/hash pair
    calculation for a certain GROUP BY generation and for generation
    aggregations merging that takes place in RowAggStorage::finalize.
    This patch generalizes rowHashToIdx to leverage it in both cases
    mentioned above.
2023-03-28 19:10:41 +03:00
7f3d540841 MCOL-5438 COUNT() in math causes SEGV (#2769)
Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>
2023-03-10 19:32:17 +03:00
56f2346083 Remove windows ifdefs 2023-03-02 15:59:42 +00:00
123c345b40 remove winport 2023-03-02 15:37:11 +00:00
8642231666 Changes to compile local 2022-11-17 11:29:21 -06:00
b57d2c30fe Minor fixes 2022-10-31 14:56:32 +02:00
1714b75434 Non working attempt to do MCOL-5227 2022-10-31 14:56:32 +02:00
440101dfff [MCOL-5213] Fix a rare IO error 2022-09-14 17:12:15 +03:00
568ac5ba7b Merge pull request #2535 from mariadb-corporation/int128Fields
Int128 fields
2022-08-28 17:42:15 +01:00
d2432f9bf6 get rid of pointers for 128 fields 2022-08-26 15:12:22 +00:00
0863ecd279 Replace getBinaryField 2022-08-25 18:21:43 +03:00
72e264e8ef MCOL-5199 This patch solves the overal performance degradation introduced with a new way of char columns hashing
in aggregation code
The patch disables padding that forces hasher to calculate over the whole 2k buffer. This patch also moves hashing code
into the common place where it belongs.
2022-08-24 19:07:06 +00:00
20f57b713a MCOL-5198 This patch enables RowStorage to dump data on disk
using startNewGeneration if there is 50 Megs left free
2022-08-24 14:00:43 +00:00
15ce531270 Randomly start a new generation if the free memory is less than 30% 2022-08-24 14:00:37 +00:00
dca359c2ab Fix excessive memory consumption at the last stage of aggregation 2022-08-18 14:00:53 +03:00
dd96e686c0 MCOL-5153 This patch replaces MDB collation aware hash function with the (#2488)
exact functionality that does not use MDB hash function.
This patch also takes a bit from Robin Hood hash map implementation forgotten
that reduces hash function collision rate.
2022-08-07 02:36:03 +03:00
6b17c358c0 MCOL-5153 This increases the size of the multiplier in the guarding check in RowAggStorage::increaseSize() so that it doesn't throw w/o a reason (#2463) 2022-07-22 10:19:36 -05:00
272246e9fa Merge branch 'develop' into MCOL-4841 2022-06-09 16:58:33 -05:00
3b6449842f Merge branch 'develop' into MCOL-4841
# Conflicts:
#	exemgr/main.cpp
#	oam/etc/Columnstore.xml.singleserver
#	primitives/primproc/primproc.cpp
2022-06-09 10:07:26 -05:00
c7e67aedd9 Renamed variables + removed server tests 2022-06-03 15:30:25 +03:00
c5fa27475d Welford algorithm for STD and VAR
Naive algorithm for calculating STD and VAR is subject to catastrophic
cancellation. A well-known Welford's algorithms is used instead.
2022-06-03 15:29:30 +03:00
fbd043b036 Fixing alightment for clang tests of rowgroup 2022-03-23 14:29:19 +00:00
3919c541ac New warnfixes (#2254)
* Fix clang warnings

* Remove vim tab guides

* initialize variables

* 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length

* Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison

* chars are unsigned on ARM, having  if (ival < 0) always false

* chars are unsigned by default on ARM and comparison with -1 if always true
2022-02-17 13:08:58 +03:00
973e5024d8 MCOL-4957 Fix performance slowdown for processing TIMESTAMP columns.
Part 1:
 As part of MCOL-3776 to address synchronization issue while accessing
 the fTimeZone member of the Func class, mutex locks were added to the
 accessor and mutator methods. However, this slows down processing
 of TIMESTAMP columns in PrimProc significantly as all threads across
 all concurrently running queries would serialize on the mutex. This
 is because PrimProc only has a single global object for the functor
 class (class derived from Func in utils/funcexp/functor.h) for a given
 function name. To fix this problem:

   (1) We remove the fTimeZone as a member of the Func derived classes
   (hence removing the mutexes) and instead use the fOperationType
   member of the FunctionColumn class to propagate the timezone values
   down to the individual functor processing functions such as
   FunctionColumn::getStrVal(), FunctionColumn::getIntVal(), etc.

   (2) To achieve (1), a timezone member is added to the
   execplan::CalpontSystemCatalog::ColType class.

Part 2:
 Several functors in the Funcexp code call dataconvert::gmtSecToMySQLTime()
 and dataconvert::mySQLTimeToGmtSec() functions for conversion between seconds
 since unix epoch and broken-down representation. These functions in turn call
 the C library function localtime_r() which currently has a known bug of holding
 a global lock via a call to __tz_convert. This significantly reduces performance
 in multi-threaded applications where multiple threads concurrently call
 localtime_r(). More details on the bug:
   https://sourceware.org/bugzilla/show_bug.cgi?id=16145

 This bug in localtime_r() caused processing of the Functors in PrimProc to
 slowdown significantly since a query execution causes Functors code to be
 processed in a multi-threaded manner.

 As a fix, we remove the calls to localtime_r() from gmtSecToMySQLTime()
 and mySQLTimeToGmtSec() by performing the timezone-to-offset conversion
 (done in dataconvert::timeZoneToOffset()) during the execution plan
 creation in the plugin. Note that localtime_r() is only called when the
 time_zone system variable is set to "SYSTEM".

 This fix also required changing the timezone type from a std::string to
 a long across the system.
2022-02-14 14:12:27 -05:00
27dea733c5 MCOL4841 dev port run large join without OOM 2022-02-09 17:33:55 -06:00
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
01f3ceb437 replace header guards with #pragma once 2022-01-21 15:24:58 +00:00
6393c6d019 MCOL-4810 Add support for missed operation for longStrings. 2021-10-28 10:02:02 +03:00
3de038c1da MCOL-4876 This patch enables continues buffer to be used by ColumnCommand and aligns BPP::blockData
that in most cases was unaligned
2021-10-06 09:23:40 +00:00
6a4140394d [MCOL-4829] More accurate memory counting 2021-09-07 19:52:20 +03:00
7fea3c988e [MCOL-4829] Compression for the temp disk-based aggregation files 2021-09-02 19:30:25 +03:00
46cf13ffa8 Merge pull request #2101 from denis0x0D/MCOL-4810_2
MCOL-4810 Redundant copying and wasting memory in PrimProc
2021-08-27 14:05:51 +03:00
7bda598fbf MCOL-4810 Redundant copying and wasting memory in PrimProc
This patch eliminates a copying `long string`s into the bytestream.
2021-08-26 12:16:23 +03:00
5c5f103f98 MCOL-4839: Fix clang build (#2100)
* Fix clang build

* Extern C returned to plugin_instance

Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>
2021-08-23 10:45:10 -05:00
73e710ed52 Add ctest for google unittests 2021-08-02 19:41:04 +03:00
3d557a2f1e Merge pull request #2044 from dhall-MariaDB/MCOL-3738
MCOL-3738 COUNT(DISTINCT) with multiple parms
2021-07-12 07:34:56 -04:00
51a8ffcb6a Fix sumavgoverflow.sql test 2021-07-09 22:41:28 +00:00
76607be63a MCOL-3738 COUNT(DISTINCT) with multiple parms
Fixed regression
Added a few more mtr tests
2021-07-09 09:07:03 -05:00
f81f743282 Replace underlying type for avg and sum for int types from long double to wide decimal 2021-07-08 17:04:43 +00:00
1113470551 MCOL-4738 AVG gives wrong results with strict_aliasing
A f fix that works with strict_aliasing
2021-07-07 13:08:32 -05:00
8988253ff4 Merge pull request #2031 from mariadb-corporation/bar-develop-MCOL-4801
MCOL-4801 Replace Row methods getStringLength() and getStringPointer(…
2021-07-07 13:53:19 +04:00
8332ab8974 MCOL-4738 AVG() returns a wrong result
On AMD64 machines, the fpu is 80 bits. The unused bits must be masked for memcmp to work properly. For other archetectures, we don't want to mask those bits.
2021-07-06 19:50:00 -05:00