This changeset enables quick (mariadb -q) mode when columnstore is
installed. Quick mode precludes client CLI program from storing too
much data in memory, preventing out of memory conditions.
Add quick-max-column-width=0 to prevent extra garbage dashes in output.
This patch introduces an internal aggregate operator SELECT_SOME that
is automatically added to columns that are not in GROUP BY. It
"computes" some plausible value of the column (actually, last one
passed).
Along the way it fixes incorrect handling of HAVING being transferred
into WHERE, window function handling and a bit of other inconsistencies.
Fixes in UBSAN related commit introduced more server-compatible
behavior that differ fom our old behavior. Thus, old tests broke and
their results had to be changed. This is what this patch does.
The most important fix here is the fix of possible buffer overrun in
DATEFORMAT() function. A "%W" format, repeated enough times, would
overflow the 256-bytes buffer for result. Now we use ostringstream to
construct result and we are safe.
Changes in date/time projection functions made me fix difference between
us and server behavior. The new, better behavior is reflected in changes
in tests' results.
Also, there was incorrect logic in TRUNCATE() and ROUND() functions in
computing the decimal "shift."
This changeset enables quick (mariadb -q) mode when columnstore is
installed. Quick mode precludes client CLI program from storing too
much data in memory, preventing out of memory conditions.
* MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194)
This patch fixes the problem in MCOL-4234 and also generally improves
behavior of GROUP BY.
It does so by introducing a "dummy" aggregate and by wrapping columns
into it. This allows for columns that are not in GROUP BY to be used
more freely, for example, in SELECT * FROM tbl GROUP BY col - all
columns that are not "col" will be wrapped into an aggregate and query
will proceed to execution.
The dummy aggregate itself does nothing more than remember last value
passed into it.
There also an additional error message that tries to explain what types
of expressions can be wrapped into an aggregate.
* MCOL-5772: incorrect ORDER BY ordering for a columns not in GROUP BY (#3214)
When ORDER BY column is not in GROUP BY, is not an aggregate and there
is a SELECT column that is also not an aggregate, there was a problem:
ordering happened on the SELECTed column, not ORDERed one.
This patch fixes that particular problem and also performs some tidying
around newly added aggregate.
---------
Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com>
* MCOL-5328: PCRE based regexp regexp_substr regexp_instr regexp_replace
* Add qa test for MCOL-5328
---------
Co-authored-by: Susil Behera <susil.behera@mariadb.com>
The UPDATE statement wrote NULL when the column set is DATETIME and
value is '0000-00-00 00:00:00'. The problem was inside WriteEngine's
handling of UPDATE statements and this is where heart of change is.
Other changes are related to some obsolete data structures in DML/DDL
handling that just hanging around there, doing nothing.
Limit test containers by memory, fix cgroup path inside the containers by introducing new ugly setting name
---------
Co-authored-by: drrtuy <roman.nozdrin@mariadb.com>
Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>
This changeset contains fixes in Oracle mode tests and for the
implementation of the CONCAT_ORACLE. Also, we harmonise our translation
process with the recent changes in the server.
Due to changed behavior of the server, some CREATE VIEW/EXPLAIN
statements' results begun to output unexpected results and need to be
fixed.
Also, concatenation operation's name also changed. This lead to disabled
func_concat_oracle test to be enabled to test it and it turned out that
our implementation of this function was broken and need to be fixed too.
This is a fix of logging subsystem, nothing else.
The old code expanded an argument into string and advanced too little
and, if expansion contained argument's index, it expanded it again. And
again.
Fixes MCOL-5643.
The problem was that different views with same column names in GROUP BY
and on the SELECT clause produced an error about "projection column is
not an aggergate neither in GROUP BY list."
This was due to incorrect search in expressions's list that lead to
duplicate columns in GROUP BY list.
JSON functions were implemented violating an assumption of their
pureness, as they should not have any state. This concrete patch
fixes implementation of JSON_VALUE function.
We add intermediate calculations in int128_t when target is UBIGINT and
check for overflow before converting into the UBIGINT. This is so
because we can overflow on addition and multiplication, with (some)
signed operands or both unsigned.
Adds a special column which helps to differentiate data and rollups of
various depts and a simple logic to row aggregation to add processing of
subtotals.
1. Extend the following CalpontSystemCatalog member functions to
set CalpontSystemCatalog::ColType::charsetNumber, after the
system catalog update to add charset number to calpontsys.syscolumn
in MCOL-5005:
CalpontSystemCatalog::lookupOID
CalpontSystemCatalog::colType
CalpontSystemCatalog::columnRIDs
CalpontSystemCatalog::getSchemaInfo
2. Update cpimport to use the CHARSET_INFO object associated with the
charset number retrieved from the system catalog, for a
dictionary/non-dictionary CHAR/VARCHAR/TEXT column, to truncate
long strings that exceed the target column character length.
3. Add MTR test cases.
* Added order by clause to keep results consistent over test runs
* Updated test result for the merging of MCOL-5519
* Updated test results for the merging of MCOL-4632
* Updated test result for the merging of MCOL-5519
* Added missing / to path
* Improved few tests cases
* Fixed test case name
---------
Co-authored-by: root <root@rocky8.localdomain>
1. Extend the calpontsys.syscolumn system catalog table
with a new column, 'charsetnum'.
'charsetnum' field is set to the 'number' member of the
'charset_info_st' struct defined in the server in m_ctype.h.
For CHAR/VARCHAR/TEXT column types, 'charset_info_st' is
initialized to the charset/collation of the column, which
is set at the column-level or at the table-level in the DDL.
For BLOB/VARBINARY binary column types, 'charset_info_st' is
initialized to my_charset_bin (charsetnum=63).
For all other column types, charsetnum is set to 0.
2. Add support for the newly added 'charsetnum' column in the
automatic system catalog upgrade logic in dbbuilder.
For existing table definitions, charsetnum for the column is
defaulted to 0.
3. Add MTR test case that creates a few table definitions with
a range of charset/collation combinations and queries the
calpontsys.syscolumn system catalog table with the charsetnum
field for the columns in the table DDLs.
feat(charset)!: utf8 is a new charset default and utf8_general_ci is a new collation default in the engine configuration file shipped
---------
Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>
Co-authored-by: mariadb-DanielLee <daniel.lee@mariadb.com>
Remove redundant cast.
As C-style casts with a type name in parantheses are interpreted as static_casts this literally just changes the interpretation around (and forces an implicit cast to match the return value of the function).
Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency.
Make consistent with relation between BIGINTNULL and BIGINTEMPTYROW & make adapted cast behaviour due to NULL markers more intuitive. (After this change we can simply block the highest possible uint64_t value and if a cast results in it, print the next lower value (2^64 - 2). Previously, (2^64 - 1) was able to be printed, but (2^64 - 2) as being blocked by the UBIGINTNULL constant was not, making finding the appropiate replacement value to give out more confusing.
Introduce MAX_MCS_UBIGINT and MIN_MCS_BIGINT and adapt casts.
Adapt casting to BIGINT to remove NULL marker error.
Add bugfix regression test for MCOL 4632
Add regression test for mcol_4648
Revert "Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency."
This reverts commit 83eac11b18937ecb0b4c754dd48e4cb47310f620.
Due to backwards compatability issues.
Refactor casting to MCS[U]Int to datatype functions.
Update regression tests to include other affected datatypes.
Apply formatting.
Refactor according to PR review
Remove redundant new constant, switch to using already existing constant.
Adapt nullstring casting to EMPTYROW markers for backwards compatability.
Adapt tests for backward compatability behaviour allowing text datatypes to be casted to EMPTYROW constant.
Adapt mcol641-functions test according to bug fix.
Update tests according to new expected behaviour.
Adapt tests to new understanding of issue.
Update comments/documentation for MCOL_4632 test.
Adapt to new cast limit logic.
Make bracketing consistent.
Adapt previous regression test to new expected behaviour.
This patch:
1. Properly processes situation when pm join result count is exceeded.
2. Adds session variable 'columnstore_max_pm_join_result_count` to control the limit.
Internal memory representation of MEDIUMINT datatype uses 24 bits. This is
true for both MariaDB server as well as ColumnStore. MCS plugin code uses
TypeHandlerSInt24 and TypeHandlerUInt24 classes to respectively convert the
binary representation of the signed and unsigned MEDIUMINT values passed by
the server to the plugin. The plugin then outputs the text representation
of these values into an open file descriptor which is piped to cpimport
for the final load into the MCS db files.
The TypeHandlerXInt24 classes were earlier incorrectly using
WriteBatchField::ColWriteBatchXInt32() functions which operate on a 4 byte
buffer. This resulted in incorrect parsing of MEDIUMINT values. As a fix,
we implement WriteBatchField::ColWriteBatchXInt24() functions which
correctly handle the 24 bit input buffer used for MEDIUMINT datatype.
For the following query:
select item from (
select item from (select a as item from t1) tt
union all
select item from (select a as item from t1) tt
) ttt;
There is an if predicate in buildSimpleColFromDerivedTable() that compares
the outermost query field name (ttt.item) to the returned column list of
the inner query (tt.item) when building the returned column list of the
outer most query. In the above query example, the inner query field name
is an alias set in the inner most query and is set to "`tt`.`item`",
while the outermost query field name is set to "item". The use of
backticks "`" in the inner query alias is causing the execution to
not enter the if block which creates the SimpleColumn for the outermost
query field name. As a fix, we strip off the backticks from the inner
query alias.
Add getDecimalVal in func_round and func_truncate for getting value while filtering
MCOL-4991 Solving TRUNCATE/ROUND/CEILING functions on TIME/DATETIME/TIMESTAMP
Update func_cast.cpp
This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.
Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.