1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-10-31 18:30:33 +03:00
Commit Graph

60 Commits

Author SHA1 Message Date
drrtuy
b73c245ecc fix(rbo,QA): removed duplicated symbols 2025-09-08 15:20:31 +00:00
drrtuy
d981f96b2a Merge branch 'stable-23.10' into feat/MCOL-6072-parallel-scan-4-CES-4 2025-09-08 15:16:52 +00:00
Aleksei Antipovskii
76f135b2ac Code cleanup and deduplication 2025-09-05 17:07:21 +04:00
Aleksei Antipovskii
3a316181e7 feat(joblist): MCOL-5756 MCOL-5222 ORDER BY on UNIONs in outer select 2025-09-05 17:07:21 +04:00
Leonid Fedorov
6ec363af70 MCOL-6145: mcsgetplan() UDF for CSEP printing 2025-09-02 12:51:39 +01:00
drrtuy
8962ec4830 Merge branch 'stable-23.10' into feat/MCOL-6072-parallel-scan-4-CES-4 2025-08-21 16:56:11 +00:00
Leonid Fedorov
56d04cb711 MCOL-6018: cleanup stacks on error 2025-08-18 15:47:06 +04:00
Leonid Fedorov
14fe4401bc chore(rbo): MCOL-6143: Settle down rbo as a separate lib for Unittesting (#3708)
* fix builds

* MCOL-6143: rbo as a separate library

* Move get_unstable_optimizer out of rbo, move findStatisticsForATable to header bo be (dirty fix, but cuts the corner)

* Add some helpers to a headerfile for unittesting

* Simple unittests with some mocks
2025-08-15 12:28:28 +01:00
drrtuy
c030ff4224 feat(rbo,rules,QA): index column type is now derived from the corresponding Field 2025-08-13 13:20:57 +00:00
drrtuy
e167082497 feat(rbo,rules,QA): refactored statistics storage 2025-08-13 13:20:57 +00:00
drrtuy
ebaa6cce5c feat(rbo,rules): preparation to replace derived-based with table-based approach 2025-08-13 13:11:46 +00:00
drrtuy
595b1baad2 feat(rbo,rules): change rule matching part 2025-08-13 13:07:32 +00:00
Leonid Fedorov
82421c208f chore(ci): collect asan ubsan and libc++ build with mtr and regression status ignored (#3672)
* MSan added with fixes for libc++

* libc++ sepatare build

* add libc++ to ci

* libstdc++ in CI

* libcpp and msan to external projects

* std::sqrt

* awful_hack(ci): install whole llvm instead of libc++ in terrible way for test containers

* Adding ddeb packages for teststages and repos

* libc++ more for test container

* save some money on debug

* colored coredumps

* revert ci

* chore(ci): collect asan ubsan and libc++ build with mtr and regression status ignored
2025-07-31 00:32:32 +04:00
drrtuy
f881bae496 chore(rbo,rules): fixes to compile MCS with ES 10.6 2025-07-21 12:54:07 +01:00
drrtuy
e600f11aa9 feat(rbo,rules): use EI statistics for filter ranges 2025-07-21 12:54:07 +01:00
drrtuy
15be33fbc5 feat(rbo,rules): refactored statistics storage in gwi and implemented statistics based UNION rewrite. 2025-07-21 12:54:07 +01:00
drrtuy
3f9ce7779e feat(optimizer): PoC for EI stats retrieval in getSelectPlan() 2025-07-21 12:54:07 +01:00
drrtuy
dfddfedfe5 feat(optimizer): collect EI statistics for a first column in existing tables indexes 2025-07-21 12:54:07 +01:00
Leonid Fedorov
aa7e0fb9b4 Deep build refactoring phase 1 (#3562)
* configcpp refactored
* logging and datatypes refactored

* more dataconvert
* chore(build): massive removals, auto add files to debian install file
* chore(codemanagement): nodeps headers, potentioal library
* chore(build): configure before autobake
* chore(build): use custom cmake commands for components, mariadb-plugin-columnstore.install generated
* chore(build): install deps as separate step for build-packages
* more deps
* check  debian/mariadb-plugin-columnstore.install automatically
* chore(build): add option for multibracnh compilation
* Fix warning
2025-05-30 14:05:21 +04:00
Leonid Fedorov
dc4ca8d588 MCOL-5943: MCOL-4740 update rows counter for multi-table update (#3555)
* fix(plugin): MCOL-4740: This fixes update rows counter for multi-table update
For UPDATEs involving a single table, the server call to handler::direct_update_rows() is used to correctly set the count for the number of updated rows in the UPDATE statement.
However, for UPDATEs involving multi-tables, the server does not call handler::direct_update_rows(). This patch adds support to correctly report the number of updated rows to the client by setting
multi_update::updated and multi_update::found in handler::rnd_end().

* fix(plugin): MCOL-4740: this is to addres the original patch QA found in the original patch

---------

Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>
Co-authored-by: drrtuy <roman.nozdrin@mariadb.com>
2025-05-29 14:23:37 +01:00
drrtuy
6b8adb822b chore(connector): remove unused and disabled group by handler (#3481) 2025-04-04 21:27:07 +01:00
Aleksei Antipovskii
cbf4fef47c more overrides 2025-02-21 20:01:34 +04:00
Aleksei Antipovskii
0ab03c7258 chore(codestyle): mark virtual methods as override 2025-02-21 20:01:34 +04:00
Sergey Zefirov
60dc7550f1 fix(group by, having): MCOL-5776: GROUP BY/HAVING closer to server's (#3371)
This patch introduces an internal aggregate operator SELECT_SOME that
is automatically added to columns that are not in GROUP BY. It
"computes" some plausible value of the column (actually, last one
passed).

Along the way it fixes incorrect handling of HAVING being transferred
into WHERE, window function handling and a bit of other inconsistencies.
2024-12-20 19:11:47 +00:00
Sergey Zefirov
3bcc2e2fda fix(memory leaks): MCOL-5791 - get rid of memory leaks in plugin code (#3365)
There were numerous memory leaks in plugin's code and associated code.
During typical run of MTR tests it leaked around 65 megabytes of
objects. As a result they may severely affect long-lived connections.

This patch fixes (almost) all leaks found in the plugin. The exceptions
are two leaks associated with SHOW CREATE TABLE columnstore_table and
getting information of columns of columnstore-handled table. These
should be fixed on the server side and work is on the way.
2024-12-06 09:04:55 +00:00
Leonid Fedorov
4b411b3968 MCOL-4696: get rid of boost::iequals 2024-08-21 21:35:52 +04:00
Sergey Zefirov
db4cb1d657 MCOL-4234 and MCOL 5772 cherry-picked into [stable 23.10] (#3226)
* MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194)

This patch fixes the problem in MCOL-4234 and also generally improves
behavior of GROUP BY.

It does so by introducing a "dummy" aggregate and by wrapping columns
into it. This allows for columns that are not in GROUP BY to be used
more freely, for example, in SELECT * FROM tbl GROUP BY col - all
columns that are not "col" will be wrapped into an aggregate and query
will proceed to execution.

The dummy aggregate itself does nothing more than remember last value
passed into it.

There also an additional error message that tries to explain what types
of expressions can be wrapped into an aggregate.

* MCOL-5772: incorrect ORDER BY ordering for a columns not in GROUP BY (#3214)

When ORDER BY column is not in GROUP BY, is not an aggregate and there
is a SELECT column that is also not an aggregate, there was a problem:
ordering happened on the SELECTed column, not ORDERed one.

This patch fixes that particular problem and also performs some tidying
around newly added aggregate.

---------

Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com>
2024-06-28 00:31:53 +04:00
Leonid Fedorov
8f93fc3623 MCOL-5493: First portion of UBSan fixes (#2842)
Multiple UB fixes
2023-06-02 17:02:09 +03:00
Sergey Zefirov
b53c231ca6 MCOL-271 empty strings should not be NULLs (#2794)
This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.

Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
2023-03-30 21:18:29 +03:00
Leonid Fedorov
56f2346083 Remove windows ifdefs 2023-03-02 15:59:42 +00:00
Gagan Goel
45a779f743 MDEV-25080 Implement ColumnStore-side changes for pushdown of SELECT_LEX_UNITs. 2023-02-27 06:38:31 -05:00
Sam James
20b5dbb617 Add missing includes
These seem to have all fallen out of a recent Boost update to 1.81 which
dropped some internal includes. All of these uses within columnstore
relied on these transitive includes, so explicitly include what we need
to fix build.

Signed-off-by: Sam James <sam@gentoo.org>
2023-01-17 01:18:41 +00:00
David.Hall
08bef648b3 Mcol 5074 Case with In and aggregates asserts (#2435)
* MCOL-5074 CASE with IN and aggregate asserts
gwip-scsp wasn't set and buildPredicateItem() was called which assumes it is set. Added code to set properly in this case
2022-07-11 16:20:15 -05:00
Gagan Goel
973e5024d8 MCOL-4957 Fix performance slowdown for processing TIMESTAMP columns.
Part 1:
 As part of MCOL-3776 to address synchronization issue while accessing
 the fTimeZone member of the Func class, mutex locks were added to the
 accessor and mutator methods. However, this slows down processing
 of TIMESTAMP columns in PrimProc significantly as all threads across
 all concurrently running queries would serialize on the mutex. This
 is because PrimProc only has a single global object for the functor
 class (class derived from Func in utils/funcexp/functor.h) for a given
 function name. To fix this problem:

   (1) We remove the fTimeZone as a member of the Func derived classes
   (hence removing the mutexes) and instead use the fOperationType
   member of the FunctionColumn class to propagate the timezone values
   down to the individual functor processing functions such as
   FunctionColumn::getStrVal(), FunctionColumn::getIntVal(), etc.

   (2) To achieve (1), a timezone member is added to the
   execplan::CalpontSystemCatalog::ColType class.

Part 2:
 Several functors in the Funcexp code call dataconvert::gmtSecToMySQLTime()
 and dataconvert::mySQLTimeToGmtSec() functions for conversion between seconds
 since unix epoch and broken-down representation. These functions in turn call
 the C library function localtime_r() which currently has a known bug of holding
 a global lock via a call to __tz_convert. This significantly reduces performance
 in multi-threaded applications where multiple threads concurrently call
 localtime_r(). More details on the bug:
   https://sourceware.org/bugzilla/show_bug.cgi?id=16145

 This bug in localtime_r() caused processing of the Functors in PrimProc to
 slowdown significantly since a query execution causes Functors code to be
 processed in a multi-threaded manner.

 As a fix, we remove the calls to localtime_r() from gmtSecToMySQLTime()
 and mySQLTimeToGmtSec() by performing the timezone-to-offset conversion
 (done in dataconvert::timeZoneToOffset()) during the execution plan
 creation in the plugin. Note that localtime_r() is only called when the
 time_zone system variable is set to "SYSTEM".

 This fix also required changing the timezone type from a std::string to
 a long across the system.
2022-02-14 14:12:27 -05:00
Leonid Fedorov
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
Leonid Fedorov
01f3ceb437 replace header guards with #pragma once 2022-01-21 15:24:58 +00:00
Gagan Goel
195425924d MCOL-4936 Disable binlog for DML statements.
DML statements executed on the primary node in a ColumnStore
cluster do not need to be written to the primary's binlog. This
is due to ColumnStore's distributed storage architecture.

With this patch, we disable writing to binlog when a DML statement
(INSERT/DELETE/UPDATE/LDI/INSERT..SELECT) is performed on a ColumnStore
table. HANDLER::external_lock() calls are used to
  1. Turn OFF the OPTION_BIN_LOG flag
  2. Turn ON the OPTION_BIN_TMP_LOG_OFF flag
in THD::variables.option_bits during a WRITE lock call.

THD::variables.option_bits is restored back to the original state
during the UNLOCK call in HANDLER::external_lock().

Further, isDMLStatement() function is added to reduce code verbosity
to check if a given statement is a DML statement.

Note that with this patch, not writing to primary's binlog means
DML replication from a ColumnStore cluster to another ColumnStore
cluster or to another foreign engine will not work.
2022-01-04 17:31:59 +00:00
Roman Nozdrin
b3ab3fb514 Merge pull request #2203 from mariadb-AlexeyAntipovsky/auto-query-stats
[MCOL-4944] Automatically enable stats collection
2021-12-23 15:13:43 +03:00
Alexey Antipovsky
683a6b3d19 [MCOL-4944] Automatically enable stats collection
if it is enabled in the config
2021-12-21 11:15:25 +03:00
Gagan Goel
7f456e58cc MCOL-4868 UPDATE on a ColumnStore table containing an IN-subquery
on a non-ColumnStore table does not work.

As part of MCOL-4617, we moved the in-to-exists predicate creation
and injection from the server into the engine. However, when query
with an IN Subquery contains a non-ColumnStore table, the server
still performs the in-to-exists predicate transformation for the
foreign engine table. This caused ColumnStore's execution plan to
contain incorrect WHERE predicates. As a fix, we call
mutate_optimizer_flags() for the WRITE lock, in addition to the READ
table lock. And in mutate_optimizer_flags(), we change the optimizer
flag from OPTIMIZER_SWITCH_IN_TO_EXISTS to OPTIMIZER_SWITCH_MATERIALIZATION.
2021-12-16 23:11:26 +00:00
Alexander Barkov
fa9f18553a MCOL-4728 Query with unusual use of aggregate functions on ColumnStore table crashes MariaDB Server
After an AggreateColumn corresponding to SUM(1+1) is created,
it is pushed to the list:

    gwi.count_asterisk_list.push_back(ac)

Later, in getSelectPlan(), the expression SUM(1+1) was erroneously
treated as a constant:

  if (!hasNonSupportItem && !nonConstFunc(ifp) && !(parseInfo & AF_BIT) && tmpVec.size() == 0)
  {
     srcp.reset(buildReturnedColumn(item, gwi, gwi.fatalParseError));

This code freed the original AggregateColumn and replaced to a ConstantColumn.

But gwi.count_asterisk_list still pointer to the freed AggregateColumn().

The expression SUM(1+1) was treated as a constant because tmpVec
was empty due to a bug in this code:

                    // special handling for count(*). This should not be treated as constant.
                    if (isp->argument_count() == 1 &&
                            ( sfitempp[0]->type() == Item::CONST_ITEM &&
                                (sfitempp[0]->cmp_type() == INT_RESULT ||
                                 sfitempp[0]->cmp_type() == STRING_RESULT ||
                                 sfitempp[0]->cmp_type() == REAL_RESULT ||
                                 sfitempp[0]->cmp_type() == DECIMAL_RESULT)
                            )
                        )
                    {
                        field_vec.push_back((Item_field*)item); //dummy

Notice, it handles only aggregate functions with explicit literals
passed as an argument, while it does not handle constant expressions
such as 1+1.

Fix:

- Adding new classes ConstantColumnNull, ConstantColumnString,
  ConstantColumnNum, ConstantColumnUInt, ConstantColumnSInt,
  ConstantColumnReal, ValStrStdString, to reuse the code easier.

- Moving a part of the code from the case branch handling CONST_ITEM
  in buildReturnedColumn() into a new function
  newConstantColumnNotNullUsingValNativeNoTz(). This
  makes the code easier to read and to reuse in the future.

- Adding a new function newConstantColumnMaybeNullFromValStrNoTz().
  Removing dulplicate code from !!!four!!! places, using the new
  function instead.

- Adding a function isSupportedAggregateWithOneConstArg() to
  properly catch all constant expressions. Using the new function parse_item()
  in the code commented as "special handling for count(*)".
  Now it pushes all constant expressions to field_vec, not only
  explicit literals.

- Moving a part of the code from buildAggregateColumn()
  to a helper function processAggregateColumnConstArg().
  Using processAggregateColumnConstArg() in the CONST_ITEM
  and NULL_ITEM branches.

- Adding a new branch in buildReturnedColumn() handling FUNC_ITEM.
  If a function has constant arguments, a ConstantColumn() is
  immediately created, without going to
  buildArithmeticColumn()/buildFunctionColumn().

- Reusing isSupportedAggregateWithOneConstArg()
  and processAggregateColumnConstArg() in buildAggregateColumn().
  A new branch catches aggregate function has only one constant argument
  and immediately creates a single ConstantColumn without
  traversing to the argument sub-components.
2021-09-21 14:00:56 +04:00
Roman Nozdrin
a465b60bdd MCOL-1482 Future repetition reduction 2021-07-01 12:27:03 +00:00
Gagan Goel
49255f5cbd MCOL-1482 An UPDATE operation on a non-ColumnStore table involving a
cross-engine join with a ColumnStore table errors out.

ColumnStore cannot directly update a foreign table. We detect whether
a multi-table UPDATE operation is performed on a foreign table, if so,
do not create the select_handler and let the server execute the UPDATE
operation instead.
2021-06-25 15:27:54 +00:00
Gagan Goel
e3d8100150 MCOL-4525 Implement columnstore_select_handler=AUTO.
This feature allows a query execution to fallback to the server,
in case query execution using the select_handler (SH) fails. In case
of fallback, a warning message containing the original reason for
query failure using SH is generated.

To accomplish this task, SH execution is moved to an earlier step when
we create the SH in create_columnstore_select_handler(), instead of the
previous call to SH execution in ha_columnstore_select_handler::init_scan().
This requires some pre-requisite steps that occur in the server in
JOIN::optimize() and JOIN::exec() to be performed before starting SH execution.

In addition, missing test cases from MCOL-424 are also added to the MTR suite,
and the corresponding fix using disable_indices_for_CEJ() is reverted back
since the original fix now appears to be redundant.
2021-06-11 11:35:34 +00:00
Gagan Goel
e0d2a21cb9 MCOL-4665 Move outer join to inner join conversion into the engine.
This is a subtask of MCOL-4525 Implement select_handler=AUTO.

Server performs outer join to inner join conversion using simplify_joins()
in sql/sql_select.cc, by updating the TABLE_LIST::outer_join variable.
In order to perform this conversion, permanent changes are made in some
cases to the SELECT_LEX::JOIN::conds and/or TABLE_LIST::on_expr.
This is undesirable for MCOL-4525 which will attemp to fallback and execute
the query inside the server, in case the query execution fails in ColumnStore
using the select_handler.

For a query such as:
  SELECT * FROM t1 LEFT JOIN t2 ON expr1 LEFT JOIN t3 ON expr2
In some cases, server can update the original SELECT_LEX::JOIN::conds
and/or TABLE_LIST::on_expr and create new Item_cond_and objects
(e.g. with 2 Item's expr1 and expr2 in Item_cond_and::list).
Instead of making changes to the original query structs, we use
gp_walk_info::tableOnExprList and gp_walk_info::condList. 2 Item's,
expr1 and expr2, in the condList, mean Item_cond_and(expr1, expr2), and
hence avoid permanent transformations to the SELECT_LEX.

We also define a new member variable
ha_columnstore_select_handler::tableOuterJoinMap
which saves the original TABLE_LIST::outer_join values before they are
updated. This member variable will be used later on to restore to the original
state of TABLE_LIST::outer_join in case of a query fallback to server execution.

The original simplify_joins() implementation in the server also performs a
flattening of the JOIN nest, however we don't perform this operation in
convertOuterJoinToInnerJoin() since it is not required for ColumnStore.
2021-06-03 11:13:19 +00:00
Roman Nozdrin
c6db1f9191 Merge pull request #1825 from tntnatbry/MCOL-4617
MCOL-4617 Move in-to-exists predicate creation and injection into the engine.
2021-05-07 13:33:02 +03:00
Gagan Goel
22c7fb7c01 MCOL-4680 FROM subquery containing nested joins returns an error.
Main theme of the patch is to fix joins processing in the plugin
code. We now use SELECT_LEX::top_join_list and process the nested
joins recursively, instead of SELECT_LEX::table_list struct which
we earlier used to build the join filters. The earlier approach
did not process certain nested join ON expressions, causing certain
queries to incorrectly error out such as that described in MCOL-4680.

In addition, some legacy code is also removed.
2021-05-03 06:28:27 +00:00
Gagan Goel
f167a6e505 MCOL-4617 Move in-to-exists predicate creation and injection into the engine.
We earlier leveraged the server functionality provided by

Item_in_subselect::create_in_to_exists_cond and
Item_in_subselect::inject_in_to_exists_cond

to create and inject the in-to-exists predicate into an IN
subquery's JOIN struct. With this patch, we leave the IN subquery's
JOIN unaltered and instead directly perform this predicate creation
and injection into ColumnStore's select execution plan.
2021-04-30 07:57:00 +00:00
Gagan Goel
13264feb7d MCOL-4320/4364/4370 Fix multibyte processing for LDI/Insert...Select
For CHAR/VARCHAR/TEXT fields, the buffer size of a field represents
the field size in bytes, which can be bigger than the field size in
number of characters, for multi-byte character sets such as utf8,
utf8mb4 etc. The buffer also contains a byte length prefix which can be
up to 65532 bytes for a VARCHAR field, and much higher for a TEXT
field (we process a maximum byte length for a TEXT field which fits in
4 bytes, which is 2^32 - 1 = 4GB!).

There is also special processing for a TEXT field defined with a default
length like so:
  CREATE TABLE cs1 (a TEXT CHARACTER SET utf8)
Here, the byte length is a fixed 65535, irrespective of the character
set used. This is different from a case such as:
  CREATE TABLE cs1 (a TEXT(65535) CHARACTER SET utf8), where the byte length
for the field will be 65535*3.
2020-10-26 17:51:24 +00:00
Alexander Barkov
7f6ad16728 MCOL-4303 UPDATE..SET using another table is not updating
The change for MCOL-4264 erroneously added the "lock_type" member
to cal_connection_info, which is shared between multiple tables.
So some tables that were opened for write erroneously identified
themselves as read only.

Moving the member to ha_mcs instead.
2020-09-11 12:26:26 +04:00