1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-08-07 03:22:57 +03:00
Commit Graph

1782 Commits

Author SHA1 Message Date
Sergey Zefirov
9e0851e4cf MCOL-4766 ROLLBACK kept ranges changed inside rolled back transaction
Now ROLLBACK drops ranges to INVALID state which makes engine to rescan
blocks and discover correct ranges.
2021-07-07 18:16:56 +03:00
Roman Nozdrin
866dc25729 Merge pull request #1842 from denis0x0D/MCOL-987_LZ
MCOL-987 LZ4 compression support.
2021-07-07 13:13:18 +03:00
Roman Nozdrin
7b4f759592 Merge pull request #2032 from drrtuy/MCOL-4802
MCOL-4802 Removed ByteStream methods for bool and add some logging in…
2021-07-07 13:03:54 +03:00
Roman Nozdrin
fb5ba84212 MCOL-4802 Removed ByteStream methods for bool manipulations and add some logging into I_S.columnstore_files 2021-07-07 07:16:30 +00:00
Alexander Barkov
9794f24369 MCOL-4801 Replace Row methods getStringLength() and getStringPointer() to getConstString() 2021-07-06 21:15:32 +04:00
Denis Khalikov
cc1c3629c5 MCOL-987 Add LZ4 compression.
* Adds CompressInterfaceLZ4 which uses LZ4 API for compress/uncompress.
* Adds CMake machinery to search LZ4 on running host.
* All methods which use static data and do not modify any internal data - become `static`,
  so we can use them without creation of the specific object. This is possible, because
  the header specification has not been modified. We still use 2 sections in header, first
  one with file meta data, the second one with pointers for compressed chunks.
* Methods `compress`, `uncompress`, `maxCompressedSize`, `getUncompressedSize` - become
  pure virtual, so we can override them for the other compression algos.
* Adds method `getChunkMagicNumber`, so we can verify chunk magic number
  for each compression algo.
* Renames "s/IDBCompressInterface/CompressInterface/g" according to requirement.
2021-07-06 18:04:37 +03:00
Roman Nozdrin
b9bd207d3b Merge pull request #2029 from tntnatbry/MCOL-641-cleanup
MCOL-641 Cleanup.
2021-07-06 14:35:21 +03:00
Gagan Goel
8520f87237 MCOL-641 Cleanup. 2021-07-06 09:01:49 +00:00
Leonid Fedorov
3fb5579708 Delete duplicate precision initilization in distinct aggregate prepare 2021-07-05 15:54:58 +03:00
David.Hall
237cad347f MCOL-4758 Limit LONGTEXT and LONGBLOB to 16MB (#1995)
MCOL-4758 Limit LONGTEXT and LONGBLOB to 16MB

Also add the original test case from MCOL-3879.
2021-07-05 02:09:41 -04:00
Roman Nozdrin
6b823db28b Merge pull request #1913 from denis0x0D/MCOL-1205
MCOL-1205 Support queries with circular joins
2021-07-02 20:47:14 +03:00
Roman Nozdrin
a4aecc120e Merge pull request #2006 from tntnatbry/fix-const-scalar-subselect
Fixes for queries containing constant scalar subselects in the WHERE clause.
2021-07-02 20:15:09 +03:00
Gagan Goel
8d0ca55495 Fixes for queries containing constant scalar subselects in the WHERE clause.
For queries of the form:
SELECT col1 FROM t1 WHERE col2 = (SELECT 2);

We fix the execution plan which earlier had an empty filters
expression. For this query, we now build a SimpleFilter with a
SimpleColumn and a ConstantColumn as the LHS and the RHS operands
respectively.

For queries of the form:
SELECT ... WHERE col1 NOT IN (SELECT <const_item>);

The execution plan earlier built a SimpleFilter with an "=" as
the predicate operator of the filter. We fix this by assigning
the correct "<>" operator instead.
2021-07-02 16:40:30 +00:00
Denis Khalikov
1d5f309b8f MCOL-1205 Support queries with circular joins
This patch adds support for queries with circular joins.
Currently support added for inner joins only.
2021-07-02 18:37:07 +03:00
Roman Nozdrin
6dc356ed60 Merge pull request #1989 from denis0x0D/MCOL-4713
MCOL-4713 Analyze table implementation.
2021-07-02 16:17:07 +03:00
Roman Nozdrin
4016e25e5b Merge pull request #2022 from mariadb-corporation/bar-develop-MCOL-4791
MCOL-4791 Fix ColumnCommand fudged data type format to clearly identi…
2021-07-02 14:07:03 +03:00
Denis Khalikov
c20015a7b2 MCOL-4713 Analyze table implementation. 2021-07-02 12:37:12 +03:00
Alexander Barkov
e8126bede5 MCOL-4791 Fix ColumnCommand fudged data type format to clearly identify CHAR vs VARCHAR 2021-07-02 12:42:03 +04:00
Roman Nozdrin
a465b60bdd MCOL-1482 Future repetition reduction 2021-07-01 12:27:03 +00:00
Roman Nozdrin
325bb6c9e0 Merge pull request #1986 from tntnatbry/MCOL-1482
MCOL-1482 An UPDATE operation on a non-ColumnStore table involving a cross-engine join
2021-07-01 14:25:32 +03:00
David.Hall
132146b9c8 Mcol 3738 Allow COUNT(DISTINCT to have multiple parms) (#2002)
* MCOL-3738 allow COUNT(DISTINCT) multiple parameters
Changes in the way tupleaggregatestep sets up the aggregate arrays.

* MCOL-3738 mtr test
2021-06-28 20:14:44 +03:00
Roman Nozdrin
5ef676c734 Merge pull request #2003 from drrtuy/MCOL-4770
MCOL-4770 Installation.PMwithUM is enabled by default so a plugin com…
2021-06-28 12:42:38 +03:00
Gagan Goel
49255f5cbd MCOL-1482 An UPDATE operation on a non-ColumnStore table involving a
cross-engine join with a ColumnStore table errors out.

ColumnStore cannot directly update a foreign table. We detect whether
a multi-table UPDATE operation is performed on a foreign table, if so,
do not create the select_handler and let the server execute the UPDATE
operation instead.
2021-06-25 15:27:54 +00:00
David.Hall
28fd12a008 Merge pull request #1874 from mariadb-corporation/bar-develop-MCOL-4681
MCOL-4681 Fix install_mcs_mysql.sh.in to do CREATE FUNCTION instead o…
2021-06-24 09:11:06 -05:00
Roman Nozdrin
2de4888899 Merge pull request #1990 from drrtuy/MCOL-4173_9
MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI…
2021-06-24 16:15:07 +03:00
Roman Nozdrin
14dbb6a8fb MCOL-4770 Installation.PMwithUM is enabled by default so a plugin communicates with a local EM only
Fix in DEC::Setup to allow to re-establish all PM connections in EM
2021-06-24 11:07:54 +00:00
Roman Nozdrin
bed0b7c6bc MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI, functional JOINs
based on top of TypelessData
2021-06-24 08:07:23 +00:00
Gagan Goel
7c8b502dc2 Fix regression in a query involving an aggregate function on a
non-wide decimal column in the HAVING clause.

In buildAggregateColumn(), if an aggregate function (such as avg)
is applied on a non-wide decimal column, we were setting the precision
of the resulting column as -1. This later down in the execution got
converted to 255 as in some cases, precision is stored as uint8_t.
The predicate operations on a DECIMAL column has logic that uses
the wide Decimal::s128value field if precision > 18. This logic incorrectly
used the Decimal::s128value instead of the correct value stored in the
narrow Decimal::value field, since precision of the Decimal column
was 255. The fix is to set the aggregate column precision to
datatypes::INT64MAXPRECISION (18) in buildAggregateColumn() when the
aggregate is applied on a non-wide decimal column.

This commit also partially fixes -Wstrict-aliasing GCC warnings.
2021-06-22 11:11:34 +00:00
Roman Nozdrin
e153486361 Merge pull request #1994 from denis0x0D/MCOL-4685_rename
MCOL-4685 Remname UNUSED -> SNAPPY
2021-06-16 11:24:17 +03:00
Denis Khalikov
e2a5956ef8 MCOL-4685 Remname UNUSED -> SNAPPY 2021-06-15 21:19:09 +03:00
Roman Nozdrin
7086694d48 Merge pull request #1991 from drrtuy/MCOL-4694
MCOL-4694 Add the JobList.DECConnectionsPerQuery setting to allow to
2021-06-14 20:54:09 +03:00
Roman Nozdrin
96f2a55eea Merge pull request #1970 from tntnatbry/MCOL-4525
MCOL-4525 Implement columnstore_select_handler=AUTO.
2021-06-14 10:43:34 +03:00
Roman Nozdrin
736b9b81fc MCOL-4694 Add the JobList.DECConnectionsPerQuery setting to allow to
distribute Jobs b/w DEC connections from EM to PPs
2021-06-14 07:15:03 +00:00
Gagan Goel
e3d8100150 MCOL-4525 Implement columnstore_select_handler=AUTO.
This feature allows a query execution to fallback to the server,
in case query execution using the select_handler (SH) fails. In case
of fallback, a warning message containing the original reason for
query failure using SH is generated.

To accomplish this task, SH execution is moved to an earlier step when
we create the SH in create_columnstore_select_handler(), instead of the
previous call to SH execution in ha_columnstore_select_handler::init_scan().
This requires some pre-requisite steps that occur in the server in
JOIN::optimize() and JOIN::exec() to be performed before starting SH execution.

In addition, missing test cases from MCOL-424 are also added to the MTR suite,
and the corresponding fix using disable_indices_for_CEJ() is reverted back
since the original fix now appears to be redundant.
2021-06-11 11:35:34 +00:00
Alexander Barkov
d00ace2398 MCOL-4757 Empty set in SELECT * INFORMATION_SCHEMA.COLUMNSTORE_TABLES WHERE TABLE_NAME='t1' 2021-06-11 12:00:23 +04:00
Alexey Antipovsky
0dedb7e628 Fix compilation warnings 2021-06-09 16:51:00 +03:00
Denis Khalikov
2cd024145c MCOL-4685: Fix GCC warnings.
This patch fixes GCC warnings:
1.'const long int' and 'long unsigned int' [-Werror=sign-compare]
2. unused variable 'cf' [-Werror=unused-variable]
2021-06-09 14:33:32 +03:00
Roman Nozdrin
7a152c6a19 Merge pull request #1944 from mariadb-AlexeyAntipovsky/MCOL-563-dev
[MCOL-4709] Disk-based aggregation
2021-06-08 20:42:58 +03:00
Alexey Antipovsky
475104e4d3 [MCOL-4709] Disk-based aggregation
* Introduce multigeneration aggregation

* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)

* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization

* Try to limit the qty of buckets at a low limit

* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory

* do not dump more than half of rowgroups to disk if generations are
  allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
  so the generations start more evenly

* Unify temp data location

* Explicitly create temp subdirectories
  whether disk aggregation/join are enabled or not
2021-06-06 16:09:15 +03:00
Roman Nozdrin
47e9fc0312 Merge pull request #1922 from denis0x0D/MCOL-4685
MCOL-4685: Eliminate some irrelevant settings (uncompressed data and extents per file)
2021-06-06 16:00:29 +03:00
Gagan Goel
3537c0d635 Merge pull request #1962 from tntnatbry/MCOL-4642
MCOL-4642 NOT IN subquery containing an isnull in the OR predicate crashes server.
2021-06-04 07:18:46 -04:00
Roman Nozdrin
911a41f5be Merge pull request #1935 from tntnatbry/MCOL-4665
MCOL-4665 Move outer join to inner join conversion into the engine.
2021-06-03 16:05:58 +03:00
Denis Khalikov
606194e6e4 MCOL-4685: Eliminate some irrelevant settings (uncompressed data and extents per file).
This patch:
1. Removes the option to declare uncompressed columns (set columnstore_compression_type = 0).
2. Ignores [COMMENT '[compression=0] option at table or column level (no error messages, just disregard).
3. Removes the option to set more than 2 extents per file (ExtentsPreSegmentFile).
4. Updates rebuildEM tool to support up to 10 dictionary extent per dictionary segment file.
5. Adds check for `DBRootStorageType` for rebuildEM tool.
6. Renamed rebuildEM to mcsRebuildEM.
2021-06-03 14:44:33 +03:00
Gagan Goel
e0d2a21cb9 MCOL-4665 Move outer join to inner join conversion into the engine.
This is a subtask of MCOL-4525 Implement select_handler=AUTO.

Server performs outer join to inner join conversion using simplify_joins()
in sql/sql_select.cc, by updating the TABLE_LIST::outer_join variable.
In order to perform this conversion, permanent changes are made in some
cases to the SELECT_LEX::JOIN::conds and/or TABLE_LIST::on_expr.
This is undesirable for MCOL-4525 which will attemp to fallback and execute
the query inside the server, in case the query execution fails in ColumnStore
using the select_handler.

For a query such as:
  SELECT * FROM t1 LEFT JOIN t2 ON expr1 LEFT JOIN t3 ON expr2
In some cases, server can update the original SELECT_LEX::JOIN::conds
and/or TABLE_LIST::on_expr and create new Item_cond_and objects
(e.g. with 2 Item's expr1 and expr2 in Item_cond_and::list).
Instead of making changes to the original query structs, we use
gp_walk_info::tableOnExprList and gp_walk_info::condList. 2 Item's,
expr1 and expr2, in the condList, mean Item_cond_and(expr1, expr2), and
hence avoid permanent transformations to the SELECT_LEX.

We also define a new member variable
ha_columnstore_select_handler::tableOuterJoinMap
which saves the original TABLE_LIST::outer_join values before they are
updated. This member variable will be used later on to restore to the original
state of TABLE_LIST::outer_join in case of a query fallback to server execution.

The original simplify_joins() implementation in the server also performs a
flattening of the JOIN nest, however we don't perform this operation in
convertOuterJoinToInnerJoin() since it is not required for ColumnStore.
2021-06-03 11:13:19 +00:00
Alexander Barkov
d61690748e MCOL-4743 Regression: TIME_TO_SEC(const_expr) erroneosly returns 0 2021-06-03 11:16:53 +04:00
Gagan Goel
6f69194462 MCOL-4642 NOT IN subquery containing an isnull in the OR predicate crashes server.
InSub::handleFunc() was incorrectly exiting early for an IN subquery
containing an isnull predicate in the OR operation in the WHERE clause.
This patch properly handles the OR predicate containing an isnull/isnotnull
predicate in the WHERE clause. We don't remove the isnull operand from the
filter ParseTree in 6.x as the server no longer injects the isnull predicate
in the IN subquery due to MCOL-4617, where we moved the creation and injection
of in-to-exists predicate into the engine.
2021-06-03 01:34:14 -04:00
Roman Nozdrin
c15eb6531e MCOL-4734 Centos7 compilation fix 2021-05-25 18:25:48 +00:00
Roman Nozdrin
42e710f817 Merge pull request #1942 from mariadb-corporation/bar-develop-compile-10.6
Fixing 10.6 + develop compilation failure
2021-05-25 14:31:37 +03:00
Alexander Barkov
9608533d92 MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop
mcsconfig.h and my_config.h have the following
pre-processor definitions:

1. Conflicting definitions coming from the standard cmake definitions:
- PACKAGE
- PACKAGE_BUGREPORT
- PACKAGE_NAME
- PACKAGE_STRING
- PACKAGE_TARNAME
- PACKAGE_VERSION
- VERSION

2. Conflicting definitions of other kinds:
- HAVE_STRTOLL - this is a dirt in MariaDB headers.
  Should be fixed in the server code. my_config.h erroneously
  performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1".
  in some cases. The former is not CMake compatible style. The latter is.

3. Non-conflicting definitions:
  Otherwise, mcsconfig.h and my_config.h should be mutually compatible,
  because both are generated by cmake on the same host machine. So
  they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc.

Observations:
- It's OK to include both mcsconfig.h and my_config.h providing that we
  suppress duplicate definition of the above conflicting types #1 and #2.
- There is no a need to suppress duplicate definitions mentioned in #3,
  as they are compatible!
- my_sys.h and m_ctype.h must always follow a CMake configuation header,
  either my_config.h or mcsconfig.h (or both).
  They must never be included without any preceeding configuration header.

This change make sure that we resolve conflicts by:
- either disallowing inclusion of mcsconfig.h and my_config.h
  at the same time
- or by hiding conflicting definitions #1 and #2
  (with their later restoring).
- also, by making sure that my_sys.h and m_ctype.h always follow
  a CMake configuration file.

Details:
- idb_mysql.h can now only be included only after my_config.h
  An attempt to use idb_mysql.h with mcsconfig.h instead of
  my_config.h is caught by the "#error" preprocessor directive.

- mariadb_my_sys.h can now be only included after mcsconfig.h.
  An attempt to use mariadb_my_sys.h without mcscofig.h
  (e.g. with my_config.h) is also caught by "#error".

- collation.h now can now be included in two ways.
  It now has the following effective structure:

    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    //  Remember current conflicting definitions on the preprocessor stack
    //  Undefine current conflicting definitions
    #endif
    #include "mcsconfig.h"
    #include "m_ctype.h"
    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    #    Restore conflicting definitions from the preprocessor stack
    #endif

  and can be included as follows:

  a. using only mcsconfig.h as a configuration header:

    // my_config.h must not be included so far
    #include "collation.h"

  b. using my_config.h as the first included configuration file:

    #define PREFER_MY_CONFIG_H // Force conflict resolution
    #include "my_config.h"     // can be included directly or indirectly
    ...
    #include "collation.h"

Other changes:

- Adding helper header files
     utils/common/mcsconfig_conflicting_defs_remember.h
     utils/common/mcsconfig_conflicting_defs_restore.h
     utils/common/mcsconfig_conflicting_defs_undef.h
  to perform conflict resolution easier.

- Removing `#include "collation.h"` from a number of files,
  as it's automatically included from rowgroup.h.

- Removing redundant `#include "utils_utf8.h"`.
  This change is not directly related to the problem being fixed,
  but it's nice to remove redundant directives for both collation.h
  and utils_utf8.h from all the files that do not really need them.
  (this change could probably have gone as a separate commit)

- Changing my_init() to MY_INIT(argv[0]) in the MCS services sources.
  After the fix of the complitation failure it appeared that ColumnStore
  services compiled with the debug build crash due to recent changes in
  safemalloc. The crash happened in strcmp() with `my_progname` as an argument
  (where my_progname is a mysys global variable). This problem should
  probably be fixed on the server side as well to avoid passing NULL.
  But, the majority of MariaDB executable programs also use MY_INIT(argv[0])
  rather than my_init(). So let's make MCS do like the other programs do.
2021-05-25 12:34:36 +04:00
Gagan Goel
a8ceb1d090 Merge pull request #1938 from mariadb-corporation/bar-develop-MCOL-4276
MCOL-4726 Wrong result of WHERE char1_col='A'
2021-05-24 06:43:11 -04:00