1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-30 19:23:07 +03:00
Commit Graph

43 Commits

Author SHA1 Message Date
0be1c3dc8f MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing.
1. Input and output RowGroup's used in GROUP_CONCAT classes
are currently allocating a raw memory buffer of size equal
to the actual width of the string datatype. As an example,
for the following query:
  SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1;
If col2 is a TEXT field with default width, the input
RowGroup containing the target rows to be concatenated will
assign 64kb of memory for every input row in the RowGroup.
This is wasteful as actual field values in real workloads
would be much smaller. We fix this by enabling the
RowGroup to use the StringStore when the RowGroup contains
long strings.

2. RowAggregation::initialize() allocates a memory buffer
for a NULL row. The size of this buffer is equal to the
row size for the output RowGroup. For the above scenario,
using the default group_concat_max_len (which is a server
variable that sets the maximum length of the GROUP_CONCAT string)
value of 1mb, the buffer size would be
(1mb + 64kb + some additional metadata). If the user sets
group_concat_max_len to a higher value, say 3gb, this buffer
size would be ~3gb. Now if the runtime initiates several
instances of RowAggregation, total memory consumption by
PrimProc could exceed the hardware memory limits causing the
OS OOM to kill the process. We fix this problem by again
enabling the StringStore for the NULL row allocation.

3. In the plugin code in buildAggregateColumn(), there is
an integer overflow when the server group_concat_max_len
variable (which is an uint32_t) is set to a value > INT32_MAX
(such as 3gb) and is assigned to
CalpontSystemCatalog::ColType::colWidth (which is an int32_t).
As a short term fix, we saturate the assigned value to colWidth
to INT32_MAX. Proper fix would be to upgrade
CalpontSystemCatalog::ColType::colWidth to an uint32_t.
2023-05-01 13:06:23 -04:00
c2d0fa24da replace boost::shared_array<T> to std::shared_ptr<T[]> 2023-04-14 10:33:27 +00:00
a508b86091 remove boost/shared_array include 2023-04-14 09:42:50 +00:00
6c32c658d5 MCOL-5385: Delete RowGroup::setData and make Pointer ctor explicit (#2808)
* Delete RowGroup::setData and make Pointer ctor explicit

* some push_backs replaced with emplace_backs

* Fixes of review notes
2023-04-13 03:55:30 +03:00
b53c231ca6 MCOL-271 empty strings should not be NULLs (#2794)
This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.

Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
2023-03-30 21:18:29 +03:00
d0eea0ffe8 MCOL-5385 This patch reduces RAM consumption and adds GROUP_CONCAT RAM accounting feature 2023-01-11 09:52:10 +00:00
1714b75434 Non working attempt to do MCOL-5227 2022-10-31 14:56:32 +02:00
0863ecd279 Replace getBinaryField 2022-08-25 18:21:43 +03:00
3b6449842f Merge branch 'develop' into MCOL-4841
# Conflicts:
#	exemgr/main.cpp
#	oam/etc/Columnstore.xml.singleserver
#	primitives/primproc/primproc.cpp
2022-06-09 10:07:26 -05:00
53b9a2a0f9 MCOL-4580 extent elimination for dictionary-based text/varchar types
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.

The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.

The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.
2022-03-02 23:53:39 +03:00
3919c541ac New warnfixes (#2254)
* Fix clang warnings

* Remove vim tab guides

* initialize variables

* 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length

* Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison

* chars are unsigned on ARM, having  if (ival < 0) always false

* chars are unsigned by default on ARM and comparison with -1 if always true
2022-02-17 13:08:58 +03:00
27dea733c5 MCOL4841 dev port run large join without OOM 2022-02-09 17:33:55 -06:00
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
9794f24369 MCOL-4801 Replace Row methods getStringLength() and getStringPointer() to getConstString() 2021-07-06 21:15:32 +04:00
69911c2710 A joint patch for MCOL-4614, MCOL-4615, MCOL-4660 (decimal to string conversion)
This patch fixes:
- MCOL-4614 calShowPartitions() precision loss for huge narrow decimal
- MCOL-4615 GROUP_CONCAT() precision loss for huge narrow decimal
- MCOL-4660 Narow decimal to string conversion is inconsistent about zero integral

Changes:
- Implementing Row::getDecimalField()

- Removing double arithmetic from the code printing DECIMAL values
  in TypeHandlerXDecimal::format64() and GroupConcator::outputRow().
  Using Decimal::toString() instead.

- Rewriting Decimal::toStringTSInt64(). The old implementation
  was wrong, too complex and slow (used unnecessary memmove, memcpy).

An additional cleanup:
- Removing the ENGINE=COLUMNSTORE clause from tests for MCOL-4532 and MCOL-4640
  type_decimal.test is combinations-aware. It's run two times with
  default_storage_engine=MyISAM and default_storage_engine=COLUMNSTORE.
  So the CREATE TABLE statements should not specify the engine explicitly.
- Adding --disable_warnings in the old fixed test.
  We needed to suppress warnings when the MyISAM combination is being run.
  Previously the table was erroneously created with ENGINE=COLUMNSTORE
  even with the MyISAM combination run. So warning were not generated.
2021-04-05 16:36:19 +04:00
494bde61e1 MCOL-4409 Moved static Decimal conversion methods into VDecimal class
MCOL-4409 This patch combines VDecimal and Decimal and makes
IDB_Decimal an alias for the result class

MCOL-4409 More boilerplate reduction in Func_mod

Removed couple TSInt128::toType() methods
2020-11-30 12:08:52 +00:00
58495d0d2f MCOL-4387 Convert dataconvert::decimalToString() into VDecimal and TSInt128 methods 2020-11-18 13:53:16 +00:00
129d5b5a0f MCOL-4174 Review/refactor frontend/connector code 2020-11-18 13:53:15 +00:00
7d3d828790 MCOL-641 This commit fixes regression introduced by an earlier
commit for group_concat() on a narrow decimal field.
2020-11-18 13:52:20 +00:00
638202417f MCOL-4171 2020-11-18 13:52:19 +00:00
a61c190668 MCOL-641 Remove memset from group_concat. 2020-11-18 13:51:26 +00:00
778ff607c8 MCOL-641 This commit disables length estimation for DECIMALs. 2020-11-18 13:51:26 +00:00
a7fcf39f2a MCOL-641 Fixed group_concat for narrow-DECIMALs. 2020-11-18 13:51:26 +00:00
f63611c422 MCOL-641 This commit adds support for group_concat w/o ORDER BY.
Small refactoring in Row methods.
2020-11-18 13:51:26 +00:00
06e50e0926 MCOL-3536 collation 2020-05-26 12:42:11 -05:00
04a9027782 MCOL-3904 group_concat with sum from subquery in order by 2020-03-26 13:35:15 -05:00
7489d0bfd0 MCOL-3625 Rename packages
Rename packages to MariaDB-columnstore-engine, MariaDB-columnstore-libs
and MariaDB-columnstore-platform.

Also add the "columnstore-" prefix the the components so that MariaDB's
packaging system understands then and add a line to include them in
MariaDB's packaging.

In addition
* Fix S3 building for dist source build
* Fix Debian 10 dependency issue
* Fix git handling for dist builds
* Add support for MariaDB's RPM building
* Use MariaDB's PCRE and readline
* Removes a few dead files
* Fix Boost noncopyable includes
2019-12-04 11:04:39 +00:00
6ccdb4ac10 1. Fix for group_concat(distinct ... ) to use all unique values in a row.
2. Explicitly set array bounds for Row::hash() and Row::equals().
2019-08-15 01:15:05 -04:00
a09a9d5d0f Mass substitution 'Corporaton' -> 'Corporation' 2019-08-07 14:43:25 -05:00
e89d1ac3cf MCOL-265 Add support for TIMESTAMP data type 2019-04-23 00:00:09 -04:00
3f2c753947 MCOL-1822-c final checkin 2019-03-05 09:33:39 -06:00
cf056e42ac Merge branch 'develop-1.2' into MCOL-1822-c 2019-02-27 13:20:45 -06:00
c5b9ae11e5 MCOL-1822 add LONG DOUBLE support 2019-01-29 09:55:43 -06:00
6d65b13852 MCOL-901 Significanlty reduced memory consumption for group_concat().
RowGroup default constructor allocates memory too generously.

    Removed commented code from groupconcat.cpp.
2019-01-29 16:02:37 +03:00
6fa7dded6f MCOL-1201 manual rebase with develop. Obsoletes branch MCOL-1201 2018-06-05 13:54:17 -05:00
c40903de9b MCOL-392 Apply astyle
Make this branch apply our style guidelines
2018-05-01 09:52:26 +01:00
3c1ebd8b94 MCOL-392 Add initial TIME datatype support 2018-04-30 09:42:41 +01:00
01446d1e22 Reformat all code to coding standard 2017-10-26 17:18:17 +01:00
4adb50f171 MCOL-707 Fix ExeMgr's memory accounting
ExeMgr uses ResourceManager to count memory usage. If a usage exceeded
error occurs the counting wasn't reset and subsequent usage attempts in
the same ExeMgr thread would error.

This patch moves the in-class accounting for GroupConcat and others so
that it happens before the error is detected. The memory usage counter
is then decremented correctly on the class destructor.
2017-05-10 11:45:31 +01:00
6128293ad3 MCOL-671 Fix TEXT/BLOB single row SELECT WHERE
pDictionaryScan won't work for BLOB/TEXT since it requires searching the
data file and rebuilding the token from matches. The tokens can't be
rebuild correctly due the bits in the token used for block counts. This
patch forces the use of pDictionaryStep instead for WHERE conditions.

In addition this patch adds support for TEXT/BLOB in various parts of
the job step processing. This fixes things like error 202 during an
UPDATE with a join condition on TEXT/BLOB columns.
2017-04-21 11:21:59 +01:00
ffcfc41563 MCOL-507 Further ExeMgr performance improvements
This does the following:

* Switch resource manager to a singleton which reduces the amount of
times the XML data is scanned and objects allocated.
* Make the I_S tables use the FE implementation of the system catalog
* Make the I_S.columnstore_columns table use the RID list cache
* Make the extentmap pre-allocate a vector instead of many small allocs
2017-01-16 12:33:27 +00:00
b1b60065d9 build fails with boost linking errors 2016-06-03 13:55:09 +03:00
f6afc42dd0 the begginning 2016-01-06 14:08:59 -06:00