The fix is simple: enable subtotals in single-phase aggregation and
disable parallel processing when there are subtotals and aggregation is
single-phase.
Fixes MCOL-5643.
The problem was that different views with same column names in GROUP BY
and on the SELECT clause produced an error about "projection column is
not an aggergate neither in GROUP BY list."
This was due to incorrect search in expressions's list that lead to
duplicate columns in GROUP BY list.
This is "productization" of an old code that would enable extent
elimination for dictionary columns.
This concrete patch enables it, fixes perfomance degradation (main
problem with old code) and also fixes incorrect behavior of cpimport.
For UPDATEs involving a single table, the server call to handler::direct_update_rows() is used to correctly set the count for the number of updated rows in the UPDATE statement.
However, for UPDATEs involving multi-tables, the server does not call handler::direct_update_rows(). This patch adds support to correctly report the number of updated rows to the client by setting
multi_update::updated and multi_update::found in handler::rnd_end().
We add intermediate calculations in int128_t when target is UBIGINT and
check for overflow before converting into the UBIGINT. This is so
because we can overflow on addition and multiplication, with (some)
signed operands or both unsigned.
calpontsys.syscolumn syscat table to be latin1.
This change is done in one of the ctors of pColStep which is
initiated while building the job list from the execution plan.
Adds a special column which helps to differentiate data and rollups of
various depts and a simple logic to row aggregation to add processing of
subtotals.
1. Extend the following CalpontSystemCatalog member functions to
set CalpontSystemCatalog::ColType::charsetNumber, after the
system catalog update to add charset number to calpontsys.syscolumn
in MCOL-5005:
CalpontSystemCatalog::lookupOID
CalpontSystemCatalog::colType
CalpontSystemCatalog::columnRIDs
CalpontSystemCatalog::getSchemaInfo
2. Update cpimport to use the CHARSET_INFO object associated with the
charset number retrieved from the system catalog, for a
dictionary/non-dictionary CHAR/VARCHAR/TEXT column, to truncate
long strings that exceed the target column character length.
3. Add MTR test cases.
1. Extend the calpontsys.syscolumn system catalog table
with a new column, 'charsetnum'.
'charsetnum' field is set to the 'number' member of the
'charset_info_st' struct defined in the server in m_ctype.h.
For CHAR/VARCHAR/TEXT column types, 'charset_info_st' is
initialized to the charset/collation of the column, which
is set at the column-level or at the table-level in the DDL.
For BLOB/VARBINARY binary column types, 'charset_info_st' is
initialized to my_charset_bin (charsetnum=63).
For all other column types, charsetnum is set to 0.
2. Add support for the newly added 'charsetnum' column in the
automatic system catalog upgrade logic in dbbuilder.
For existing table definitions, charsetnum for the column is
defaulted to 0.
3. Add MTR test case that creates a few table definitions with
a range of charset/collation combinations and queries the
calpontsys.syscolumn system catalog table with the charsetnum
field for the columns in the table DDLs.
feat(charset)!: utf8 is a new charset default and utf8_general_ci is a new collation default in the engine configuration file shipped
---------
Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>
Co-authored-by: mariadb-DanielLee <daniel.lee@mariadb.com>
DMLProcessor functor earlier did not check if the DBRM was in read-only mode.
This allowed DML statements to continue execution to the point where it locks
the table and then sends the statement down to the WriteEngineServer, which
ultimately returns back in an error state to DMLProc when it fails to perform
BRM updates due to DBRM in read-only mode. This caused a lingering table lock
in the system which could only be cleared on a system restart.
As a fix, we add a check in the DMLProcessor functor to detect if DBRM is in
read only mode, and if so, return back early in the execution of the DML
statement.
Remove redundant cast.
As C-style casts with a type name in parantheses are interpreted as static_casts this literally just changes the interpretation around (and forces an implicit cast to match the return value of the function).
Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency.
Make consistent with relation between BIGINTNULL and BIGINTEMPTYROW & make adapted cast behaviour due to NULL markers more intuitive. (After this change we can simply block the highest possible uint64_t value and if a cast results in it, print the next lower value (2^64 - 2). Previously, (2^64 - 1) was able to be printed, but (2^64 - 2) as being blocked by the UBIGINTNULL constant was not, making finding the appropiate replacement value to give out more confusing.
Introduce MAX_MCS_UBIGINT and MIN_MCS_BIGINT and adapt casts.
Adapt casting to BIGINT to remove NULL marker error.
Add bugfix regression test for MCOL 4632
Add regression test for mcol_4648
Revert "Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency."
This reverts commit 83eac11b18937ecb0b4c754dd48e4cb47310f620.
Due to backwards compatability issues.
Refactor casting to MCS[U]Int to datatype functions.
Update regression tests to include other affected datatypes.
Apply formatting.
Refactor according to PR review
Remove redundant new constant, switch to using already existing constant.
Adapt nullstring casting to EMPTYROW markers for backwards compatability.
Adapt tests for backward compatability behaviour allowing text datatypes to be casted to EMPTYROW constant.
Adapt mcol641-functions test according to bug fix.
Update tests according to new expected behaviour.
Adapt tests to new understanding of issue.
Update comments/documentation for MCOL_4632 test.
Adapt to new cast limit logic.
Make bracketing consistent.
Adapt previous regression test to new expected behaviour.
This patch:
1. Properly processes situation when pm join result count is exceeded.
2. Adds session variable 'columnstore_max_pm_join_result_count` to control the limit.
This patch improves/fixes the existing handling of CHARSET and
COLLATION symbols in the ColumnStore DDL parser.
Also, add fCollate and fCharsetNum member variables to the
ddlpackage::ColumnType class.
This patch:
1. Handles corner case when the bucket exceeded the memory limit, but we cannot redistribute the data in this bucket into new buckets based on a hash algorithm, because the rows have the same values.
2. Adds force option for disk join step.
3. Add a option to contol the depth of the partition tree.
Internal memory representation of MEDIUMINT datatype uses 24 bits. This is
true for both MariaDB server as well as ColumnStore. MCS plugin code uses
TypeHandlerSInt24 and TypeHandlerUInt24 classes to respectively convert the
binary representation of the signed and unsigned MEDIUMINT values passed by
the server to the plugin. The plugin then outputs the text representation
of these values into an open file descriptor which is piped to cpimport
for the final load into the MCS db files.
The TypeHandlerXInt24 classes were earlier incorrectly using
WriteBatchField::ColWriteBatchXInt32() functions which operate on a 4 byte
buffer. This resulted in incorrect parsing of MEDIUMINT values. As a fix,
we implement WriteBatchField::ColWriteBatchXInt24() functions which
correctly handle the 24 bit input buffer used for MEDIUMINT datatype.
This patch is the JSON_ARRAYAGG clone of the changes done in MCOL-5429
where we enabled usage of StringStore for long strings in
GROUP_CONCAT() processing to reduce memory footprint of PrimProc and
thus avoiding a potential OS triggered OOM crash.
For the following query:
select item from (
select item from (select a as item from t1) tt
union all
select item from (select a as item from t1) tt
) ttt;
There is an if predicate in buildSimpleColFromDerivedTable() that compares
the outermost query field name (ttt.item) to the returned column list of
the inner query (tt.item) when building the returned column list of the
outer most query. In the above query example, the inner query field name
is an alias set in the inner most query and is set to "`tt`.`item`",
while the outermost query field name is set to "item". The use of
backticks "`" in the inner query alias is causing the execution to
not enter the if block which creates the SimpleColumn for the outermost
query field name. As a fix, we strip off the backticks from the inner
query alias.
1. Input and output RowGroup's used in GROUP_CONCAT classes
are currently allocating a raw memory buffer of size equal
to the actual width of the string datatype. As an example,
for the following query:
SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1;
If col2 is a TEXT field with default width, the input
RowGroup containing the target rows to be concatenated will
assign 64kb of memory for every input row in the RowGroup.
This is wasteful as actual field values in real workloads
would be much smaller. We fix this by enabling the
RowGroup to use the StringStore when the RowGroup contains
long strings.
2. RowAggregation::initialize() allocates a memory buffer
for a NULL row. The size of this buffer is equal to the
row size for the output RowGroup. For the above scenario,
using the default group_concat_max_len (which is a server
variable that sets the maximum length of the GROUP_CONCAT string)
value of 1mb, the buffer size would be
(1mb + 64kb + some additional metadata). If the user sets
group_concat_max_len to a higher value, say 3gb, this buffer
size would be ~3gb. Now if the runtime initiates several
instances of RowAggregation, total memory consumption by
PrimProc could exceed the hardware memory limits causing the
OS OOM to kill the process. We fix this problem by again
enabling the StringStore for the NULL row allocation.
3. In the plugin code in buildAggregateColumn(), there is
an integer overflow when the server group_concat_max_len
variable (which is an uint32_t) is set to a value > INT32_MAX
(such as 3gb) and is assigned to
CalpontSystemCatalog::ColType::colWidth (which is an int32_t).
As a short term fix, we saturate the assigned value to colWidth
to INT32_MAX. Proper fix would be to upgrade
CalpontSystemCatalog::ColType::colWidth to an uint32_t.