MCOL-271 introduced a bug in JSON_VALUE that was discovered during
implementation of ASAN builds. The changes here restore normal
functionality.
In short, changes in MCOL-271 introduced a local variable instead of
reference to a string in ConstantColumn's fResult.strVal. The handling
of ConstantColumn is different because ConstantColumn's value is used
to initialize JSON path once. JSON path value holds pointer to data it
does not own and if there are two or more rows the data can be corrupted
and/or be out of stack bounds.
The changes here introduce reference to a NullString that is held in the
ConstantColumn's fResult.strVal and uses appropriate functions to obtain
data from the NullString. CC's fResult is held by CC and strVal is also
neither changing nor moving during operation, which allow JSON path to
hold correct pointers during multi-row operation.
A new option is added (-n/--no-clean-install) that allows a user to
re-install ColumnStore without deleting the existing db files. This is
useful for testing new code changes in the engine without the need to
re-create the database tables to test the code changes.
Internal memory representation of MEDIUMINT datatype uses 24 bits. This is
true for both MariaDB server as well as ColumnStore. MCS plugin code uses
TypeHandlerSInt24 and TypeHandlerUInt24 classes to respectively convert the
binary representation of the signed and unsigned MEDIUMINT values passed by
the server to the plugin. The plugin then outputs the text representation
of these values into an open file descriptor which is piped to cpimport
for the final load into the MCS db files.
The TypeHandlerXInt24 classes were earlier incorrectly using
WriteBatchField::ColWriteBatchXInt32() functions which operate on a 4 byte
buffer. This resulted in incorrect parsing of MEDIUMINT values. As a fix,
we implement WriteBatchField::ColWriteBatchXInt24() functions which
correctly handle the 24 bit input buffer used for MEDIUMINT datatype.
This patch is the JSON_ARRAYAGG clone of the changes done in MCOL-5429
where we enabled usage of StringStore for long strings in
GROUP_CONCAT() processing to reduce memory footprint of PrimProc and
thus avoiding a potential OS triggered OOM crash.
For the following query:
select item from (
select item from (select a as item from t1) tt
union all
select item from (select a as item from t1) tt
) ttt;
There is an if predicate in buildSimpleColFromDerivedTable() that compares
the outermost query field name (ttt.item) to the returned column list of
the inner query (tt.item) when building the returned column list of the
outer most query. In the above query example, the inner query field name
is an alias set in the inner most query and is set to "`tt`.`item`",
while the outermost query field name is set to "item". The use of
backticks "`" in the inner query alias is causing the execution to
not enter the if block which creates the SimpleColumn for the outermost
query field name. As a fix, we strip off the backticks from the inner
query alias.
1. Input and output RowGroup's used in GROUP_CONCAT classes
are currently allocating a raw memory buffer of size equal
to the actual width of the string datatype. As an example,
for the following query:
SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1;
If col2 is a TEXT field with default width, the input
RowGroup containing the target rows to be concatenated will
assign 64kb of memory for every input row in the RowGroup.
This is wasteful as actual field values in real workloads
would be much smaller. We fix this by enabling the
RowGroup to use the StringStore when the RowGroup contains
long strings.
2. RowAggregation::initialize() allocates a memory buffer
for a NULL row. The size of this buffer is equal to the
row size for the output RowGroup. For the above scenario,
using the default group_concat_max_len (which is a server
variable that sets the maximum length of the GROUP_CONCAT string)
value of 1mb, the buffer size would be
(1mb + 64kb + some additional metadata). If the user sets
group_concat_max_len to a higher value, say 3gb, this buffer
size would be ~3gb. Now if the runtime initiates several
instances of RowAggregation, total memory consumption by
PrimProc could exceed the hardware memory limits causing the
OS OOM to kill the process. We fix this problem by again
enabling the StringStore for the NULL row allocation.
3. In the plugin code in buildAggregateColumn(), there is
an integer overflow when the server group_concat_max_len
variable (which is an uint32_t) is set to a value > INT32_MAX
(such as 3gb) and is assigned to
CalpontSystemCatalog::ColType::colWidth (which is an int32_t).
As a short term fix, we saturate the assigned value to colWidth
to INT32_MAX. Proper fix would be to upgrade
CalpontSystemCatalog::ColType::colWidth to an uint32_t.
* Add dmesg to regressionlog step
* dmesg from the outside
* put dmesg one level higher
* Add server logs
* Archive the whole queries dir
* Remove dmesg
* Output details into CI logs
* Run the script and print out diffs
* Logging in -log stage + put logs on s3
* Typo
* Install find + put logs in a dir
* Proper logs location
* Put all output in the script + put script in the regression step
* Typo
* Move more logic to script
* Add a guard
* Conditional execution
* Move bash -c to the beginning
* Never fail go.sh, reg-logs checks for failures
* Add env var to start regression with custom test set (name for the step
is broken)
* Typos + only exec test000 once
* Proper expression with if
* Remove timeout to get results
* Remove flag for full regression for now
* Remove unnecessary var
* Make build scripts color brighter
* better colors, draw deps and ninja generator options
* Add color spinner for configure and install make changed to cmake --build and cmake --install
* Clean more builds log garbage
* Fixes of bugs from ASAN warnings, part one
* MQC as static library, with nifty counter for global map and mutex
* Switch clang to 16
* link messageqcpp to execplan
Add getDecimalVal in func_round and func_truncate for getting value while filtering
MCOL-4991 Solving TRUNCATE/ROUND/CEILING functions on TIME/DATETIME/TIMESTAMP
Update func_cast.cpp