BLOB fields did not work as grouping keys at all, they were assigned
value NULL for any value, be it NULL or not. The fix is in the
rowaggregation.cpp in the initMapping(), a switch/case branch was added
to handle BLOB field copying there.
Also, TEXT columns did not distinguish between NULL and empty string in
the grouping algorithm, now they do. The fix is in the equals()
function, now we specifically check for isNull() equality between
values.
Adds a special column which helps to differentiate data and rollups of
various depts and a simple logic to row aggregation to add processing of
subtotals.
1. Input and output RowGroup's used in GROUP_CONCAT classes
are currently allocating a raw memory buffer of size equal
to the actual width of the string datatype. As an example,
for the following query:
SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1;
If col2 is a TEXT field with default width, the input
RowGroup containing the target rows to be concatenated will
assign 64kb of memory for every input row in the RowGroup.
This is wasteful as actual field values in real workloads
would be much smaller. We fix this by enabling the
RowGroup to use the StringStore when the RowGroup contains
long strings.
2. RowAggregation::initialize() allocates a memory buffer
for a NULL row. The size of this buffer is equal to the
row size for the output RowGroup. For the above scenario,
using the default group_concat_max_len (which is a server
variable that sets the maximum length of the GROUP_CONCAT string)
value of 1mb, the buffer size would be
(1mb + 64kb + some additional metadata). If the user sets
group_concat_max_len to a higher value, say 3gb, this buffer
size would be ~3gb. Now if the runtime initiates several
instances of RowAggregation, total memory consumption by
PrimProc could exceed the hardware memory limits causing the
OS OOM to kill the process. We fix this problem by again
enabling the StringStore for the NULL row allocation.
3. In the plugin code in buildAggregateColumn(), there is
an integer overflow when the server group_concat_max_len
variable (which is an uint32_t) is set to a value > INT32_MAX
(such as 3gb) and is assigned to
CalpontSystemCatalog::ColType::colWidth (which is an int32_t).
As a short term fix, we saturate the assigned value to colWidth
to INT32_MAX. Proper fix would be to upgrade
CalpontSystemCatalog::ColType::colWidth to an uint32_t.
* Fixes of bugs from ASAN warnings, part one
* MQC as static library, with nifty counter for global map and mutex
* Switch clang to 16
* link messageqcpp to execplan
This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.
Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
* Fix clang warnings
* Remove vim tab guides
* initialize variables
* 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length
* Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison
* chars are unsigned on ARM, having if (ival < 0) always false
* chars are unsigned by default on ARM and comparison with -1 if always true
On AMD64 machines, the fpu is 80 bits. The unused bits must be masked for memcmp to work properly. For other archetectures, we don't want to mask those bits.
* Introduce multigeneration aggregation
* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)
* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization
* Try to limit the qty of buckets at a low limit
* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory
* do not dump more than half of rowgroups to disk if generations are
allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
so the generations start more evenly
* Unify temp data location
* Explicitly create temp subdirectories
whether disk aggregation/join are enabled or not
This patch fixes:
- MCOL-4614 calShowPartitions() precision loss for huge narrow decimal
- MCOL-4615 GROUP_CONCAT() precision loss for huge narrow decimal
- MCOL-4660 Narow decimal to string conversion is inconsistent about zero integral
Changes:
- Implementing Row::getDecimalField()
- Removing double arithmetic from the code printing DECIMAL values
in TypeHandlerXDecimal::format64() and GroupConcator::outputRow().
Using Decimal::toString() instead.
- Rewriting Decimal::toStringTSInt64(). The old implementation
was wrong, too complex and slow (used unnecessary memmove, memcpy).
An additional cleanup:
- Removing the ENGINE=COLUMNSTORE clause from tests for MCOL-4532 and MCOL-4640
type_decimal.test is combinations-aware. It's run two times with
default_storage_engine=MyISAM and default_storage_engine=COLUMNSTORE.
So the CREATE TABLE statements should not specify the engine explicitly.
- Adding --disable_warnings in the old fixed test.
We needed to suppress warnings when the MyISAM combination is being run.
Previously the table was erroneously created with ENGINE=COLUMNSTORE
even with the MyISAM combination run. So warning were not generated.
Removed uint128 from joblist/lbidlist.*
Another toString() method for wide-decimal that is EMPTY/NULL aware
Unified decimal processing in WF functions
Fixed a potential issue in EqualCompData::operator() for
wide-decimal processing
Fixed some signedness warnings
For now it consists of only:
using int128_t = __int128;
using uint128_t = unsigned __int128;
All new privitive data types should go into this file in the future.
WF::percentile runtime threw an exception b/c of wrong DT deduced from its argument
Replaced literals with constants
Tought WF_sum_avg::checkSumLimit to use refs instead of values
This commit also adds support in TupleHashJoinStep::forwardCPData,
although we currently do not support wide decimals as join keys.
Row estimation to determine large-side of the join is also updated.