1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00
Commit Graph

162 Commits

Author SHA1 Message Date
70547c7358 chore(plugin): translator walks are now in separate units 2025-06-27 17:38:33 +04:00
e57832ee64 feat(optimizer): temporary shield optimizer with a session variable 2025-06-26 18:35:33 +01:00
25c7d23c21 feat(optimizer): add session switch to optionally enable optimizer 2025-06-26 18:35:33 +01:00
ab6063bec4 feat(optimizer): moved related code into a separate unit 2025-06-26 18:35:33 +01:00
e07e85b750 feat(optimizer): into derived CSEP rewrite with hardcoded tables 2025-06-26 18:35:33 +01:00
1baaf878d0 feat(optimizer): basic rewrite Union unit into Sub with union 2025-06-26 18:35:33 +01:00
021a95c683 feat(optimizer): rewrite rule refactoring 2025-06-26 18:35:33 +01:00
e73e5834ab feat(optimizer): first cut for rewrite foreign table into UNION rule 2025-06-26 18:35:33 +01:00
79008f4f69 feat(CSEP): CSEP printer with indentations to simplify reading + rewriter skeleton + some test binary to describe minimalistic CSEP localy 2025-06-26 18:35:33 +01:00
3a91cded27 chore(MCOL-6018) Fix incorrect Field_decimal cast
This is a fix of a problem found by UBSAN. MDB changed default type to
represent a decimal result, C-style cast did not do proper type checking
and this one-liner fixes that. Now we will have an assertion if type
changes again.
2025-06-26 19:41:58 +04:00
44d1698639 chore(plugin): move having and group by into separate routines 2025-06-02 12:11:41 +01:00
600f10c259 chore(plugin): move order by processing 2025-06-02 12:11:41 +01:00
bb13688ccf chore(plugin): move projection processing into a separate part. 2025-06-02 12:11:41 +01:00
dc4ca8d588 MCOL-5943: MCOL-4740 update rows counter for multi-table update (#3555)
* fix(plugin): MCOL-4740: This fixes update rows counter for multi-table update
For UPDATEs involving a single table, the server call to handler::direct_update_rows() is used to correctly set the count for the number of updated rows in the UPDATE statement.
However, for UPDATEs involving multi-tables, the server does not call handler::direct_update_rows(). This patch adds support to correctly report the number of updated rows to the client by setting
multi_update::updated and multi_update::found in handler::rnd_end().

* fix(plugin): MCOL-4740: this is to addres the original patch QA found in the original patch

---------

Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>
Co-authored-by: drrtuy <roman.nozdrin@mariadb.com>
2025-05-29 14:23:37 +01:00
221ccfd5b6 fix(dbcon): MCOL-4756: having not() provokes an ERROR 2013
The `NOT()` function in the HAVING clause was handled
    incorrectly, which caused the server to crash.
2025-05-23 05:12:17 +04:00
bfe49a8345 bug(priproc) make last_day type a bit more accurate
This fixes discrepance with the server, which assigns DATE type to
last_day()'s result.

Now we also assigns DATE result type and, also, use proper
dataconvert::Day data structure to return date.

Tests agree with InnoDB.

Also, this patch includes test for MCOL-5669, to show we fixed it.
2025-05-23 05:12:17 +04:00
5f6080e09c fix(join): Fixes MCOL-5056, an error of joining TEXT column from InnoDB (#3160)
We incorrectly identified TEXT columns from external tables as BLOB.
Alexander Barkov suggested a way to discriminate them which I
implemented here.
2025-05-23 05:12:17 +04:00
0359a1fd3a chore(server_support): fixes to build columnstore branch with server >= 11.5 2025-05-22 17:14:38 +04:00
a0bee173f6 chore(build): fixes to satisfy clang19 warnings 2025-05-15 19:05:38 +04:00
bd1622f331 feat(MCOL-5886): support InnoDB's table partitions in cross-engine joins
The purpose of this changeset is to obtain list of partitions from
SELECT_LEX structure and pass it down to joblist and then to
CrossEngineStep to pass to InnoDB.
2025-04-23 08:24:10 +03:00
6a712dc0ad MCOL-5932: fix heap buffer overflow with minimal revert of MCOL-5776 breaking change (#3445) 2025-04-15 14:46:25 +01:00
4bea7e59a0 feat(PrimProc): MCOL-5852 disk-based GROUP_CONCAT & JSON_ARRAYAGG
* move GROUP_CONCAT/JSON_ARRAYAGG storage to the RowGroup from
  the RowAggregation*
* internal data structures (de)serialization
* get rid of a specialized classes for processing JSON_ARRAYAGG
* move the memory accounting to disk-based aggregation classes
* allow aggregation generations to be used for queries with
  GROUP_CONCAT/JSON_ARRAYAGG
* Remove the thread id from the error message as it interferes with the mtr
2025-04-11 15:21:07 +02:00
6b8adb822b chore(connector): remove unused and disabled group by handler (#3481) 2025-04-04 21:27:07 +01:00
6e539b8336 fix(MCOL-5889): Improper handle of DOUBLE result type with DECIMAL arguments
Sometimes server assigns DOUBLE type for arithmetic operations over
DECIMAL arguments. In this rare case width of result was incorrectly
adjusted and it triggered an assertion.

Now width of result gets adjusted only if result type is also DECIMAL.
2025-02-12 18:41:55 +04:00
60dc7550f1 fix(group by, having): MCOL-5776: GROUP BY/HAVING closer to server's (#3371)
This patch introduces an internal aggregate operator SELECT_SOME that
is automatically added to columns that are not in GROUP BY. It
"computes" some plausible value of the column (actually, last one
passed).

Along the way it fixes incorrect handling of HAVING being transferred
into WHERE, window function handling and a bit of other inconsistencies.
2024-12-20 19:11:47 +00:00
39a976c39a fix(ubsan): MCOL-5844 - iron out UBSAN reports
The most important fix here is the fix of possible buffer overrun in
DATEFORMAT() function. A "%W" format, repeated enough times, would
overflow the 256-bytes buffer for result. Now we use ostringstream to
construct result and we are safe.

Changes in date/time projection functions made me fix difference between
us and server behavior. The new, better behavior is reflected in changes
in tests' results.

Also, there was incorrect logic in TRUNCATE() and ROUND() functions in
computing the decimal "shift."
2024-12-10 20:30:58 +04:00
3bcc2e2fda fix(memory leaks): MCOL-5791 - get rid of memory leaks in plugin code (#3365)
There were numerous memory leaks in plugin's code and associated code.
During typical run of MTR tests it leaked around 65 megabytes of
objects. As a result they may severely affect long-lived connections.

This patch fixes (almost) all leaks found in the plugin. The exceptions
are two leaks associated with SHOW CREATE TABLE columnstore_table and
getting information of columns of columnstore-handled table. These
should be fixed on the server side and work is on the way.
2024-12-06 09:04:55 +00:00
6f6e69815d feat(bytestream,serdes): Distribute BS buf size data type change to avoid implicit data type narrowing 2024-11-08 16:28:51 +04:00
db4cb1d657 MCOL-4234 and MCOL 5772 cherry-picked into [stable 23.10] (#3226)
* MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194)

This patch fixes the problem in MCOL-4234 and also generally improves
behavior of GROUP BY.

It does so by introducing a "dummy" aggregate and by wrapping columns
into it. This allows for columns that are not in GROUP BY to be used
more freely, for example, in SELECT * FROM tbl GROUP BY col - all
columns that are not "col" will be wrapped into an aggregate and query
will proceed to execution.

The dummy aggregate itself does nothing more than remember last value
passed into it.

There also an additional error message that tries to explain what types
of expressions can be wrapped into an aggregate.

* MCOL-5772: incorrect ORDER BY ordering for a columns not in GROUP BY (#3214)

When ORDER BY column is not in GROUP BY, is not an aggregate and there
is a SELECT column that is also not an aggregate, there was a problem:
ordering happened on the SELECTed column, not ORDERed one.

This patch fixes that particular problem and also performs some tidying
around newly added aggregate.

---------

Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com>
2024-06-28 00:31:53 +04:00
88e3af4ba5 fix(plugin): MCOL-5236 Take Item from Ref_Item for group by list (#3162) (#3163)
Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com>
2024-06-27 17:25:57 +04:00
1b31fd3bdb fix(plugin) MCOL-5699: throw error for unimplemented INTERSECT and EXCEPT (#3219) 2024-06-27 14:20:23 +04:00
6c6fa7d5a4 MCOL-5328: PCRE based regexp regexp_substr regexp_instr regexp_replace [stable-23.10] (#3215)
* MCOL-5328: PCRE based regexp regexp_substr regexp_instr regexp_replace

* Add qa test for MCOL-5328

---------

Co-authored-by: Susil Behera <susil.behera@mariadb.com>
2024-06-27 14:20:08 +04:00
97220501ed Fixes MCOL-5700, Oracle mode test results
This changeset contains fixes in Oracle mode tests and for the
implementation of the CONCAT_ORACLE. Also, we harmonise our translation
process with the recent changes in the server.

Due to changed behavior of the server, some CREATE VIEW/EXPLAIN
statements' results begun to output unexpected results and need to be
fixed.

Also, concatenation operation's name also changed. This lead to disabled
func_concat_oracle test to be enabled to test it and it turned out that
our implementation of this function was broken and need to be fixed too.
2024-04-15 19:35:47 +03:00
982b0084db Use of newly introduced schema-based name resolution for (#3138) (#3139)
Oracle-compatible functions
Server changed the way to resolve functions' names and we need to adapt.
2024-03-12 15:21:53 +04:00
ab1a3fe471 Same columns fom diffeernt views in GROUP BY do not produce errors
Fixes MCOL-5643.

The problem was that different views with same column names in GROUP BY
and on the SELECT clause produced an error about "projection column is
not an aggergate neither in GROUP BY list."

This was due to incorrect search in expressions's list that lead to
duplicate columns in GROUP BY list.
2023-11-30 01:42:41 +04:00
b826fc1fd6 fix(datatypes, funcexp): Overflow detection for MCOL-5568 use case (and some other) (#2987)
We add intermediate calculations in int128_t when target is UBIGINT and
check for overflow before converting into the UBIGINT. This is so
because we can overflow on addition and multiplication, with (some)
signed operands or both unsigned.
2023-10-25 20:14:38 +03:00
920607520c feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support
Adds a special column which helps to differentiate data and rollups of
various depts and a simple logic to row aggregation to add processing of
subtotals.
2023-09-26 17:01:53 +03:00
d586975da7 Rename a limit var + change error message (#2946)
* Rename a limit var + change error message

* Adjust the test
2023-09-05 12:19:15 +03:00
05547f2342 Add a limit (as runtime value) for long in queries 2023-08-21 10:38:46 +03:00
896e8dd769 MCOL-5522 Properly process pm join result count. (#2909)
This patch:
1. Properly processes situation when pm join result count is exceeded.
2. Adds session variable 'columnstore_max_pm_join_result_count` to control the limit.
2023-08-04 16:55:45 +03:00
4f580d109d Fix a compiler error related to signed v/s unsigned integer comparison. (#2915) 2023-08-04 16:54:40 +03:00
65cde8c894 feature: pron (#2908)
* feature: Special dictionary, we can pass with session veriable to modify codepaths and behaviour for testing and debugging
2023-07-21 14:02:03 +03:00
1f190a6e75 MCOL-5477 Disk join step improvement.
This patch:
1. Handles corner case when the bucket exceeded the memory limit, but we cannot redistribute the data in this bucket into new buckets based on a hash algorithm, because the rows have the same values.
2. Adds force option for disk join step.
3. Add a option to contol the depth of the partition tree.
2023-06-23 18:40:15 +03:00
8f93fc3623 MCOL-5493: First portion of UBSan fixes (#2842)
Multiple UB fixes
2023-06-02 17:02:09 +03:00
1477b28ee9 MCOL-5357 Fix TPC-DS query error "MCS-3009: Unknown column '.<colname>'".
For the following query:

select item from (
select item from (select a as item from t1) tt
union all
select item from (select a as item from t1) tt
) ttt;

There is an if predicate in buildSimpleColFromDerivedTable() that compares
the outermost query field name (ttt.item) to the returned column list of
the inner query (tt.item) when building the returned column list of the
outer most query. In the above query example, the inner query field name
is an alias set in the inner most query and is set to "`tt`.`item`",
while the outermost query field name is set to "item". The use of
backticks "`" in the inner query alias is causing the execution to
not enter the if block which creates the SimpleColumn for the outermost
query field name. As a fix, we strip off the backticks from the inner
query alias.
2023-05-03 16:06:20 +00:00
0be1c3dc8f MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing.
1. Input and output RowGroup's used in GROUP_CONCAT classes
are currently allocating a raw memory buffer of size equal
to the actual width of the string datatype. As an example,
for the following query:
  SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1;
If col2 is a TEXT field with default width, the input
RowGroup containing the target rows to be concatenated will
assign 64kb of memory for every input row in the RowGroup.
This is wasteful as actual field values in real workloads
would be much smaller. We fix this by enabling the
RowGroup to use the StringStore when the RowGroup contains
long strings.

2. RowAggregation::initialize() allocates a memory buffer
for a NULL row. The size of this buffer is equal to the
row size for the output RowGroup. For the above scenario,
using the default group_concat_max_len (which is a server
variable that sets the maximum length of the GROUP_CONCAT string)
value of 1mb, the buffer size would be
(1mb + 64kb + some additional metadata). If the user sets
group_concat_max_len to a higher value, say 3gb, this buffer
size would be ~3gb. Now if the runtime initiates several
instances of RowAggregation, total memory consumption by
PrimProc could exceed the hardware memory limits causing the
OS OOM to kill the process. We fix this problem by again
enabling the StringStore for the NULL row allocation.

3. In the plugin code in buildAggregateColumn(), there is
an integer overflow when the server group_concat_max_len
variable (which is an uint32_t) is set to a value > INT32_MAX
(such as 3gb) and is assigned to
CalpontSystemCatalog::ColType::colWidth (which is an int32_t).
As a short term fix, we saturate the assigned value to colWidth
to INT32_MAX. Proper fix would be to upgrade
CalpontSystemCatalog::ColType::colWidth to an uint32_t.
2023-05-01 13:06:23 -04:00
2e1394149b MCOL-5464: Fixes of bugs from ASAN warnings, part one (#2792)
* Fixes of bugs from ASAN warnings, part one

* MQC as static library, with nifty counter for global map and mutex

* Switch clang to 16

* link messageqcpp to execplan
2023-04-04 02:33:23 +03:00
b53c231ca6 MCOL-271 empty strings should not be NULLs (#2794)
This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.

Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
2023-03-30 21:18:29 +03:00
56f2346083 Remove windows ifdefs 2023-03-02 15:59:42 +00:00
4d4e4ad30d Merge pull request #2741 from mariadb-corporation/MDEV-25080-CS-dev
MDEV-25080 Allow pushdown of queries involving UNIONs in outer select to ColumnStore
2023-02-28 11:23:50 +00:00