1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-04-18 21:44:02 +03:00

1040 Commits

Author SHA1 Message Date
Serguey Zefirov
38fd96a663 fix(memory leaks): MCOL-5791 - get rid of memory leaks in plugin code
There were numerous memory leaks in plugin's code and associated code.
During typical run of MTR tests it leaked around 65 megabytes of
objects. As a result they may severely affect long-lived connections.

This patch fixes (almost) all leaks found in the plugin. The exceptions
are two leaks associated with SHOW CREATE TABLE columnstore_table and
getting information of columns of columnstore-handled table. These
should be fixed on the server side and work is on the way.
2024-12-04 10:59:12 +03:00
Serguey Zefirov
0bc384d5f0 fix(ubsan): MCOL-5844 - iron out UBSAN reports
The most important fix here is the fix of possible buffer overrun in
DATEFORMAT() function. A "%W" format, repeated enough times, would
overflow the 256-bytes buffer for result. Now we use ostringstream to
construct result and we are safe.

Changes in date/time projection functions made me fix difference between
us and server behavior. The new, better behavior is reflected in changes
in tests' results.

Also, there was incorrect logic in TRUNCATE() and ROUND() functions in
computing the decimal "shift."
2024-12-02 20:18:13 +03:00
Alexey Antipovsky
6f4274760e
fix(plugin): MCOL-5703 fix server crash on 'UNION ALL VALUES' queries (#3335)
* fix(plugin): MCOL-5703 fix server crash on 'UNION ALL VALUES' queries
2024-11-10 18:31:38 +00:00
drrtuy
8ae5a3da40
Fix/mcol 5787 rgdata buffer max size dev (#3325)
* fix(rowgroup): RGData now uses uint64_t counter for the fixed sizes columns data buf.
	The buffer can utilize > 4GB RAM that is necessary for PM side join.
	RGData ctor uses uint32_t allocating data buffer.
 	This fact causes implicit heap overflow.

* feat(bytestream,serdes): BS buffer size type is uint64_t
	This necessary to handle 64bit RGData, that comes as
	a separate patch. The pair of patches would allow to
	have PM joins when SmallSide size > 4GB.

* feat(bytestream,serdes): Distribute BS buf size data type change to avoid implicit data type narrowing

* feat(rowgroup): this returns bits lost during cherry-pick. The bits lost caused the first RGData::serialize to crash a process
2024-11-09 19:44:02 +00:00
Aleksei Antipovskii
42be2cb7e0 fix(dbcon) MCOL-5812 server crash related to stored functions
Using the stored function's return value as an argument
for another function was handled incorrectly, leading
to a server crash.
2024-11-05 20:32:26 +04:00
Denis Khalikov
3489b2c542
fix(logging): Add setddldebuglevel command (#3312) 2024-09-10 19:10:42 +03:00
Leonid Fedorov
25c20bae9b MCOL-4696: get rid of boost::iequals 2024-08-21 20:45:16 +04:00
Sergei Golubchik
fa8631c6cb match the rename in the handler rows_changed->rows_stats.updated 2024-08-06 18:18:37 +04:00
Aleksei Antipovskii
70a7a01941 fix(dbcon): MCOL-4756: having not() provokes an ERROR 2013
The `NOT()` function in the HAVING clause was handled
    incorrectly, which caused the server to crash.
2024-07-31 16:33:34 +04:00
Leonid Fedorov
1d25cf3afd chore(codestyle): MCOL-5405: repace windows CRLF with virtious linux one 2024-07-26 18:01:35 +04:00
mariadb-AlanMologorsky
323c8822d5 MCOL-5587: Fix columnstore.cnf.
fix(client): Fix columnstore.cnf file

This fix changes option file to apply '--quick' option only for 'mariadb' and 'mysql' clients instead of all MariaDB clients.
Otherwise 'mysqladmin' uses this option, but it doesn't exist. As a result broken CI multinode MTR stage.
2024-07-26 17:16:31 +04:00
Sergey Zefirov
f5089c7d80
fix(client): MCOL-5587: enable quick mode for predictable performance (#3240)
This changeset enables quick (mariadb -q) mode when columnstore is
installed. Quick mode precludes client CLI program from storing too
much data in memory, preventing out of memory conditions.
2024-07-07 13:52:21 +01:00
Leonid Fedorov
a1e64d4cb0 bug(priproc) make last_day type a bit more accurate
This fixes discrepance with the server, which assigns DATE type to
last_day()'s result.

Now we also assigns DATE result type and, also, use proper
dataconvert::Day data structure to return date.

Tests agree with InnoDB.

Also, this patch includes test for MCOL-5669, to show we fixed it.
2024-07-01 16:25:44 +03:00
Sergey Zefirov
7ec8f3df9a
MCOL-5772: incorrect ORDER BY ordering for a columns not in GROUP BY (#3214)
When ORDER BY column is not in GROUP BY, is not an aggregate and there
is a SELECT column that is also not an aggregate, there was a problem:
ordering happened on the SELECTed column, not ORDERed one.

This patch fixes that particular problem and also performs some tidying
around newly added aggregate.
2024-06-25 16:10:27 +04:00
Sergey Zefirov
1122b64cb1
MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194)
This patch fixes the problem in MCOL-4234 and also generally improves
behavior of GROUP BY.

It does so by introducing a "dummy" aggregate and by wrapping columns
into it. This allows for columns that are not in GROUP BY to be used
more freely, for example, in SELECT * FROM tbl GROUP BY col - all
columns that are not "col" will be wrapped into an aggregate and query
will proceed to execution.

The dummy aggregate itself does nothing more than remember last value
passed into it.

There also an additional error message that tries to explain what types
of expressions can be wrapped into an aggregate.
2024-06-17 20:00:54 +03:00
Sergey Zefirov
49541993f4
fix(join): Fixes MCOL-5056, an error of joining TEXT column from InnoDB (#3160)
We incorrectly identified TEXT columns from external tables as BLOB.
Alexander Barkov suggested a way to discriminate them which I
implemented here.
2024-05-15 16:04:10 +01:00
Sergey Zefirov
5b9ddd902e
feat(ddl): MCOL-5744: better handling of utf8 charset aliases (#3174)
Server expands ut8_XXX aliases to utf8mb3_XXX or utf8mb4_XXX depending
on the UTF8_IS_UTF8MB3 setting in the OLD_MODE environment variable.

Server already has the necessary code implemented in the get_utf8_flag()
method of class THD. There are several uses of this flag and all we have
to do to be in line with server is to use it.

This patch does that for DDL as work on MCOL-5705 uncovered some
problems in that area.
2024-05-10 17:17:57 +01:00
Leonid Fedorov
71185efe54 Fixed review notices, added the loop over selects, to collect error on more tnan two selects 2024-04-18 18:31:30 +03:00
Leonid Fedorov
8efdee6eca apply clang-format 2024-04-18 18:31:30 +03:00
Leonid Fedorov
904ac415e4 fix(plugin) MCOL-5699: throw error for unimplemented INTERSECT and EXCEPT 2024-04-18 18:31:30 +03:00
Leonid Fedorov
a8d3fff79e chore(build) Rocky8 gcc vanilla build fix 2024-04-16 17:08:06 +03:00
Serguey Zefirov
3b7e69135d Fixes MCOL-5700, Oracle mode test results
This changeset contains fixes in Oracle mode tests and for the
implementation of the CONCAT_ORACLE. Also, we harmonise our
translation process with the recent changes in the server.

Due to changed behavior of the server, some CREATE VIEW/EXPLAIN
statements' results begun to output unexpected results and need to be
fixed.

Also, concatenation operation's name also changed. This lead to
disabled func_concat_oracle test to be enabled to test it and it
turned out that our implementation of this function was broken
and need to be fixed too.
2024-04-15 19:35:21 +03:00
Denis Khalikov
77cd733a6d
fix(plugin): MCOL-5236 Take Item from Ref_Item for group by list. (#3162) 2024-04-01 14:13:39 +03:00
Leonid Fedorov
af5ae35413
Revert "Fixes MCOL-5700, Oracle mode test results" 2024-03-27 18:52:30 +04:00
mariadb-KirillPerov
56b35d5cf6
Merge pull request #3156 from mariadb-corporation/sz-fix-oracle-mode
Fixes MCOL-5700, Oracle mode test results
2024-03-27 14:45:52 +06:00
Serguey Zefirov
34acd3559b Fixes MCOL-5700, Oracle mode test results
This changeset contains fixes in Oracle mode tests and for the
implementation of the CONCAT_ORACLE. Also, we harmonise our
translation process with the recent changes in the server.

Due to changed behavior of the server, some CREATE VIEW/EXPLAIN
statements' results begun to output unexpected results and need to be
fixed.

Also, concatenation operation's name also changed. This lead to
disabled func_concat_oracle test to be enabled to test it and it
turned out that our implementation of this function was broken
and need to be fixed too.
2024-03-27 10:00:39 +03:00
Leonid Fedorov
7a2ca9d6bc
MCOL-4480: TEXT type added (#3142)
* TEXT type added
* tests
2024-03-21 00:26:35 +04:00
Leonid Fedorov
5f40fb32d0
MCOL-5328: use PCRE2 and JPCRE wrapper (#3137)
PCRE2 for regexp functions in columnstore
2024-03-14 19:39:29 +04:00
Sergey Zefirov
c01e1f4ed8
Use of newly introduced schema-based name resolution for (#3138)
Oracle-compatible functions
Server changed the way to resolve functions' names and we need to adapt.
2024-03-11 19:17:46 +04:00
Leonid Fedorov
c6e9b7d448
MCOL-5624: dont force columnstore_use_import_for_batchinsert option to be required to start mariadb server (#3078) 2023-12-26 15:12:01 +04:00
Sergey Zefirov
9a84aa8d99
fix(plugin): Same columns fom different views in GROUP BY do not produce errors (#3035)
Fixes MCOL-5643.

The problem was that different views with same column names in GROUP BY
and on the SELECT clause produced an error about "projection column is
not an aggergate neither in GROUP BY list."

This was due to incorrect search in expressions's list that lead to
duplicate columns in GROUP BY list.
2023-11-28 17:30:56 +03:00
drrtuy
26f5f8fe5c
fix(plugin): this is to addres the original patch QA found in the original patch 2023-11-22 17:20:37 +03:00
Roman Nozdrin
6579180810 fix(plugin): MCOL-4740: This fixes update rows counter for multi-table update
For UPDATEs involving a single table, the server call to handler::direct_update_rows() is used to correctly set the count for the number of updated rows in the UPDATE statement.
However, for UPDATEs involving multi-tables, the server does not call handler::direct_update_rows(). This patch adds support to correctly report the number of updated rows to the client by setting
multi_update::updated and multi_update::found in handler::rnd_end().
2023-11-02 14:18:06 +00:00
Sergey Zefirov
84148cbe4c
fix(datatypes, funcexp): Overflow detection for MCOL-5568 use case (and some other) (#2987)
We add intermediate calculations in int128_t when target is UBIGINT and
check for overflow before converting into the UBIGINT. This is so
because we can overflow on addition and multiplication, with (some)
signed operands or both unsigned.
2023-10-16 16:55:02 +03:00
Sergey Zefirov
920607520c
feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support
Adds a special column which helps to differentiate data and rollups of
various depts and a simple logic to row aggregation to add processing of
subtotals.
2023-09-26 17:01:53 +03:00
Leonid Fedorov
5013717730
fix(plugin): Fix wrong ask for stat call for table mode 2023-09-26 14:43:06 +03:00
Andrey Piskunov
d586975da7
Rename a limit var + change error message (#2946)
* Rename a limit var + change error message

* Adjust the test
2023-09-05 12:19:15 +03:00
mariadb-AndreyPiskunov
05547f2342 Add a limit (as runtime value) for long in queries 2023-08-21 10:38:46 +03:00
drrtuy
f55d41c079
Merge pull request #2912 from tntnatbry/MCOL-5005
MCOL-5005 Add charset number to system catalog.
2023-08-15 22:22:21 +02:00
Gagan Goel
d50a0fa2e6 MCOL-5005 Add charset number to system catalog - Part 2.
1. Extend the calpontsys.syscolumn system catalog table
  with a new column, 'charsetnum'.

  'charsetnum' field is set to the 'number' member of the
  'charset_info_st' struct defined in the server in m_ctype.h.

  For CHAR/VARCHAR/TEXT column types, 'charset_info_st' is
  initialized to the charset/collation of the column, which
  is set at the column-level or at the table-level in the DDL.

  For BLOB/VARBINARY binary column types, 'charset_info_st' is
  initialized to my_charset_bin (charsetnum=63).

  For all other column types, charsetnum is set to 0.

  2. Add support for the newly added 'charsetnum' column in the
  automatic system catalog upgrade logic in dbbuilder.

  For existing table definitions, charsetnum for the column is
  defaulted to 0.

  3. Add MTR test case that creates a few table definitions with
  a range of charset/collation combinations and queries the
  calpontsys.syscolumn system catalog table with the charsetnum
  field for the columns in the table DDLs.
2023-08-15 17:21:47 +00:00
mariadb-AlexeyVorovich
64f1d541d0
MCOL-5519: new defaults in columnstore.cnf (#2894)
feat(charset)!: utf8 is a new charset default and utf8_general_ci is a new collation default in the engine configuration file shipped
---------

Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>
Co-authored-by: mariadb-DanielLee <daniel.lee@mariadb.com>
2023-08-15 18:04:32 +03:00
Denis Khalikov
896e8dd769
MCOL-5522 Properly process pm join result count. (#2909)
This patch:
1. Properly processes situation when pm join result count is exceeded.
2. Adds session variable 'columnstore_max_pm_join_result_count` to control the limit.
2023-08-04 16:55:45 +03:00
Gagan Goel
4f580d109d
Fix a compiler error related to signed v/s unsigned integer comparison. (#2915) 2023-08-04 16:54:40 +03:00
Gagan Goel
a36ea6dbb4 MCOL-5005 Add charset number to system catalog - Part 1.
This patch improves/fixes the existing handling of CHARSET and
COLLATION symbols in the ColumnStore DDL parser.

Also, add fCollate and fCharsetNum member variables to the
ddlpackage::ColumnType class.
2023-07-28 18:36:53 -04:00
Leonid Fedorov
65cde8c894
feature: pron (#2908)
* feature: Special dictionary, we can pass with session veriable to modify codepaths and behaviour for testing and debugging
2023-07-21 14:02:03 +03:00
Denis Khalikov
1f190a6e75 MCOL-5477 Disk join step improvement.
This patch:
1. Handles corner case when the bucket exceeded the memory limit, but we cannot redistribute the data in this bucket into new buckets based on a hash algorithm, because the rows have the same values.
2. Adds force option for disk join step.
3. Add a option to contol the depth of the partition tree.
2023-06-23 18:40:15 +03:00
Leonid Fedorov
8f93fc3623
MCOL-5493: First portion of UBSan fixes (#2842)
Multiple UB fixes
2023-06-02 17:02:09 +03:00
Gagan Goel
c598a9bbed MCOL-5480 LOAD DATA INFILE incorrectly loads values for MEDIUMINT datatype.
Internal memory representation of MEDIUMINT datatype uses 24 bits. This is
true for both MariaDB server as well as ColumnStore. MCS plugin code uses
TypeHandlerSInt24 and TypeHandlerUInt24 classes to respectively convert the
binary representation of the signed and unsigned MEDIUMINT values passed by
the server to the plugin. The plugin then outputs the text representation
of these values into an open file descriptor which is piped to cpimport
for the final load into the MCS db files.

The TypeHandlerXInt24 classes were earlier incorrectly using
WriteBatchField::ColWriteBatchXInt32() functions which operate on a 4 byte
buffer. This resulted in incorrect parsing of MEDIUMINT values. As a fix,
we implement WriteBatchField::ColWriteBatchXInt24() functions which
correctly handle the 24 bit input buffer used for MEDIUMINT datatype.
2023-05-23 16:00:05 -04:00
Gagan Goel
1477b28ee9 MCOL-5357 Fix TPC-DS query error "MCS-3009: Unknown column '.<colname>'".
For the following query:

select item from (
select item from (select a as item from t1) tt
union all
select item from (select a as item from t1) tt
) ttt;

There is an if predicate in buildSimpleColFromDerivedTable() that compares
the outermost query field name (ttt.item) to the returned column list of
the inner query (tt.item) when building the returned column list of the
outer most query. In the above query example, the inner query field name
is an alias set in the inner most query and is set to "`tt`.`item`",
while the outermost query field name is set to "item". The use of
backticks "`" in the inner query alias is causing the execution to
not enter the if block which creates the SimpleColumn for the outermost
query field name. As a fix, we strip off the backticks from the inner
query alias.
2023-05-03 16:06:20 +00:00
Gagan Goel
0be1c3dc8f MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing.
1. Input and output RowGroup's used in GROUP_CONCAT classes
are currently allocating a raw memory buffer of size equal
to the actual width of the string datatype. As an example,
for the following query:
  SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1;
If col2 is a TEXT field with default width, the input
RowGroup containing the target rows to be concatenated will
assign 64kb of memory for every input row in the RowGroup.
This is wasteful as actual field values in real workloads
would be much smaller. We fix this by enabling the
RowGroup to use the StringStore when the RowGroup contains
long strings.

2. RowAggregation::initialize() allocates a memory buffer
for a NULL row. The size of this buffer is equal to the
row size for the output RowGroup. For the above scenario,
using the default group_concat_max_len (which is a server
variable that sets the maximum length of the GROUP_CONCAT string)
value of 1mb, the buffer size would be
(1mb + 64kb + some additional metadata). If the user sets
group_concat_max_len to a higher value, say 3gb, this buffer
size would be ~3gb. Now if the runtime initiates several
instances of RowAggregation, total memory consumption by
PrimProc could exceed the hardware memory limits causing the
OS OOM to kill the process. We fix this problem by again
enabling the StringStore for the NULL row allocation.

3. In the plugin code in buildAggregateColumn(), there is
an integer overflow when the server group_concat_max_len
variable (which is an uint32_t) is set to a value > INT32_MAX
(such as 3gb) and is assigned to
CalpontSystemCatalog::ColType::colWidth (which is an int32_t).
As a short term fix, we saturate the assigned value to colWidth
to INT32_MAX. Proper fix would be to upgrade
CalpontSystemCatalog::ColType::colWidth to an uint32_t.
2023-05-01 13:06:23 -04:00