1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-08-05 16:15:50 +03:00
Commit Graph

187 Commits

Author SHA1 Message Date
Roman Nozdrin
dd96e686c0 MCOL-5153 This patch replaces MDB collation aware hash function with the (#2488)
exact functionality that does not use MDB hash function.
This patch also takes a bit from Robin Hood hash map implementation forgotten
that reduces hash function collision rate.
2022-08-07 02:36:03 +03:00
Andrey Piskunov
c3a5731890 Rename cmpGt2 2022-08-04 16:16:38 +03:00
Andrey Piskunov
24b2c1c283 Vectorizing min/max for KIND_TEXT 2022-08-04 16:16:38 +03:00
NTH19
c4798ce585 fix 2022-08-04 16:16:38 +03:00
NTH19
19ca844cd1 support_max_min 2022-08-04 16:16:38 +03:00
Andrey Piskunov
589b786fda Don't ignore null or empty in calculation 2022-08-04 16:16:38 +03:00
Andrey Piskunov
20f48fd730 Vectorized update min max 2022-08-04 16:16:38 +03:00
Andrey Piskunov
b8200acd3b Don't ignore null or empty in calculation 2022-08-04 16:16:38 +03:00
Andrey Piskunov
1681edaca0 Tests for simd min/max 2022-08-04 16:16:38 +03:00
Andrey Piskunov
9930d0dedd Vectorized update min max 2022-08-04 16:16:38 +03:00
Roman Nozdrin
df431ebad9 MCOL-5093 This patch raises the hardcoded service start TO up to 2 hours (#2469) 2022-07-22 12:25:24 -05:00
mariadb-RomanNavrotskiy
194f0e9d64 ci: new builds grid, parallel steps 2022-07-08 22:30:02 +02:00
NTH19
d451b5c7c5 fix 2022-06-24 18:06:04 +08:00
NTH19
4c0b8fd829 simd of arm neon
unit testing

pass unit test for simdprocessor

add test cases

implement specific _mm_movemask for different types

float movemask change

rename
2022-06-24 11:24:59 +08:00
david.hall
6d47529499 Merge branch 'develop' into MCOL-4841 2022-06-14 14:41:41 -05:00
David.Hall
272246e9fa Merge branch 'develop' into MCOL-4841 2022-06-09 16:58:33 -05:00
david.hall
3b6449842f Merge branch 'develop' into MCOL-4841
# Conflicts:
#	exemgr/main.cpp
#	oam/etc/Columnstore.xml.singleserver
#	primitives/primproc/primproc.cpp
2022-06-09 10:07:26 -05:00
Roman Nozdrin
7c9da5709d MCOL-5105 This patch raises pipe read operation timeout to 20 minutes
to enable DMLProc to survive rollbacks on startup.
The patch also fixes linter warnings in service.h and pipe.h.
2022-06-09 14:34:50 +00:00
Roman Nozdrin
4e50fca460 Merge pull request #2401 from denis0x0D/statistic_man
StatisticsManager initialize all plugins.
2022-06-03 15:41:28 +05:30
Denis Khalikov
6c0ebd568b StatisticsManager initialize all plugins.
This patch adds support for initializing all plugins in the system.
2022-05-31 12:42:00 +03:00
Leonid Fedorov
c25ae4f378 Use external boost 1.78 2022-05-02 18:23:37 +00:00
Leonid Fedorov
5820a21e19 Merge pull request #2331 from drrtuy/MCOL-5001-pp-em-combo-merge-1
Mcol 5001 pp em combo merge 1
2022-04-13 15:16:18 +03:00
Roman Nozdrin
e174696351 MCOL-5001 This patch merges ExeMgr and PrimProc runtimes
EM and PP are most resource-hungry runtimes.
        The merge enables to control their cummulative
        resource consumption, thread allocation + enables
        zero-copy data exchange b/w local EM and PP facilities.
2022-04-04 11:46:33 +00:00
Roman Nozdrin
7cdc914b4e MCOL-4809 This patch introduces vectorized scanning/filtering for short CHAR/VARCHAR columns
Short CHAR/VARCHAR column values contain integer-encoded strings.
    After certain manipulations(orderSwap(strnxfrm(str))) the values
    become integers that preserve original strings order relation
    according to a certain translation rules(collation). Prepared
    values are ready to be SIMD-processed.
2022-04-01 10:28:33 +00:00
Leonid Fedorov
65252df4f6 C++20 fixes 2022-03-28 12:32:29 +00:00
Leonid Fedorov
ba0306e5ce Fix build with Server 10.7 and newer. Its kinda hack, but works, can be reverted, when server fix thier code 2022-03-16 15:43:02 +00:00
Serguey Zefirov
53b9a2a0f9 MCOL-4580 extent elimination for dictionary-based text/varchar types
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.

The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.

The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.
2022-03-02 23:53:39 +03:00
Leonid Fedorov
3919c541ac New warnfixes (#2254)
* Fix clang warnings

* Remove vim tab guides

* initialize variables

* 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length

* Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison

* chars are unsigned on ARM, having  if (ival < 0) always false

* chars are unsigned by default on ARM and comparison with -1 if always true
2022-02-17 13:08:58 +03:00
Leonid Fedorov
cad6736d64 enum for SIMD out of ifdef 2022-02-11 18:18:23 +03:00
David Hall
27dea733c5 MCOL4841 dev port run large join without OOM 2022-02-09 17:33:55 -06:00
Roman Nozdrin
c79dfc4925 MCOL-4809 This patch adds support for float data types filtering and scanning vectorization 2022-02-03 16:38:56 +00:00
Leonid Fedorov
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
Leonid Fedorov
6b6411229f build fixes 2022-01-21 16:34:04 +00:00
Leonid Fedorov
01f3ceb437 replace header guards with #pragma once 2022-01-21 15:24:58 +00:00
Roman Nozdrin
af36f9940f This patch introduces support for scanning/filtering vectorized execution for numeric-based
data types TEXT, CHAR, VARCHAR, FLOAT and DOUBLE are not yet supported by vectorized path
This patch introduces an example for Google benchmarking suite to measure a perf diff
b/w legacy scan/filtering code and the templated version
2021-12-10 10:30:00 +00:00
Leonid Fedorov
5c5f103f98 MCOL-4839: Fix clang build (#2100)
* Fix clang build

* Extern C returned to plugin_instance

Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>
2021-08-23 10:45:10 -05:00
Alexander Barkov
9794f24369 MCOL-4801 Replace Row methods getStringLength() and getStringPointer() to getConstString() 2021-07-06 21:15:32 +04:00
Denis Khalikov
1d5f309b8f MCOL-1205 Support queries with circular joins
This patch adds support for queries with circular joins.
Currently support added for inner joins only.
2021-07-02 18:37:07 +03:00
Denis Khalikov
c20015a7b2 MCOL-4713 Analyze table implementation. 2021-07-02 12:37:12 +03:00
Roman Nozdrin
d8cbc000e2 Merge pull request #2004 from drrtuy/MCOL-4759
MCOL-4759 Upmerge for MCOL-4564 code that implements hash merging fam…
2021-06-28 14:05:16 +03:00
Roman Nozdrin
8c360a1a27 MCOL-4759 Upmerge for MCOL-4564 code that implements hash merging family to reduce
performance penalty using MDB hashing functions
2021-06-24 14:48:01 +00:00
Roman Nozdrin
2de4888899 Merge pull request #1990 from drrtuy/MCOL-4173_9
MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI…
2021-06-24 16:15:07 +03:00
Roman Nozdrin
bed0b7c6bc MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI, functional JOINs
based on top of TypelessData
2021-06-24 08:07:23 +00:00
Gagan Goel
7c8b502dc2 Fix regression in a query involving an aggregate function on a
non-wide decimal column in the HAVING clause.

In buildAggregateColumn(), if an aggregate function (such as avg)
is applied on a non-wide decimal column, we were setting the precision
of the resulting column as -1. This later down in the execution got
converted to 255 as in some cases, precision is stored as uint8_t.
The predicate operations on a DECIMAL column has logic that uses
the wide Decimal::s128value field if precision > 18. This logic incorrectly
used the Decimal::s128value instead of the correct value stored in the
narrow Decimal::value field, since precision of the Decimal column
was 255. The fix is to set the aggregate column precision to
datatypes::INT64MAXPRECISION (18) in buildAggregateColumn() when the
aggregate is applied on a non-wide decimal column.

This commit also partially fixes -Wstrict-aliasing GCC warnings.
2021-06-22 11:11:34 +00:00
Alexey Antipovsky
475104e4d3 [MCOL-4709] Disk-based aggregation
* Introduce multigeneration aggregation

* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)

* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization

* Try to limit the qty of buckets at a low limit

* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory

* do not dump more than half of rowgroups to disk if generations are
  allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
  so the generations start more evenly

* Unify temp data location

* Explicitly create temp subdirectories
  whether disk aggregation/join are enabled or not
2021-06-06 16:09:15 +03:00
Alexander Barkov
9608533d92 MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop
mcsconfig.h and my_config.h have the following
pre-processor definitions:

1. Conflicting definitions coming from the standard cmake definitions:
- PACKAGE
- PACKAGE_BUGREPORT
- PACKAGE_NAME
- PACKAGE_STRING
- PACKAGE_TARNAME
- PACKAGE_VERSION
- VERSION

2. Conflicting definitions of other kinds:
- HAVE_STRTOLL - this is a dirt in MariaDB headers.
  Should be fixed in the server code. my_config.h erroneously
  performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1".
  in some cases. The former is not CMake compatible style. The latter is.

3. Non-conflicting definitions:
  Otherwise, mcsconfig.h and my_config.h should be mutually compatible,
  because both are generated by cmake on the same host machine. So
  they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc.

Observations:
- It's OK to include both mcsconfig.h and my_config.h providing that we
  suppress duplicate definition of the above conflicting types #1 and #2.
- There is no a need to suppress duplicate definitions mentioned in #3,
  as they are compatible!
- my_sys.h and m_ctype.h must always follow a CMake configuation header,
  either my_config.h or mcsconfig.h (or both).
  They must never be included without any preceeding configuration header.

This change make sure that we resolve conflicts by:
- either disallowing inclusion of mcsconfig.h and my_config.h
  at the same time
- or by hiding conflicting definitions #1 and #2
  (with their later restoring).
- also, by making sure that my_sys.h and m_ctype.h always follow
  a CMake configuration file.

Details:
- idb_mysql.h can now only be included only after my_config.h
  An attempt to use idb_mysql.h with mcsconfig.h instead of
  my_config.h is caught by the "#error" preprocessor directive.

- mariadb_my_sys.h can now be only included after mcsconfig.h.
  An attempt to use mariadb_my_sys.h without mcscofig.h
  (e.g. with my_config.h) is also caught by "#error".

- collation.h now can now be included in two ways.
  It now has the following effective structure:

    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    //  Remember current conflicting definitions on the preprocessor stack
    //  Undefine current conflicting definitions
    #endif
    #include "mcsconfig.h"
    #include "m_ctype.h"
    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    #    Restore conflicting definitions from the preprocessor stack
    #endif

  and can be included as follows:

  a. using only mcsconfig.h as a configuration header:

    // my_config.h must not be included so far
    #include "collation.h"

  b. using my_config.h as the first included configuration file:

    #define PREFER_MY_CONFIG_H // Force conflict resolution
    #include "my_config.h"     // can be included directly or indirectly
    ...
    #include "collation.h"

Other changes:

- Adding helper header files
     utils/common/mcsconfig_conflicting_defs_remember.h
     utils/common/mcsconfig_conflicting_defs_restore.h
     utils/common/mcsconfig_conflicting_defs_undef.h
  to perform conflict resolution easier.

- Removing `#include "collation.h"` from a number of files,
  as it's automatically included from rowgroup.h.

- Removing redundant `#include "utils_utf8.h"`.
  This change is not directly related to the problem being fixed,
  but it's nice to remove redundant directives for both collation.h
  and utils_utf8.h from all the files that do not really need them.
  (this change could probably have gone as a separate commit)

- Changing my_init() to MY_INIT(argv[0]) in the MCS services sources.
  After the fix of the complitation failure it appeared that ColumnStore
  services compiled with the debug build crash due to recent changes in
  safemalloc. The crash happened in strcmp() with `my_progname` as an argument
  (where my_progname is a mysys global variable). This problem should
  probably be fixed on the server side as well to avoid passing NULL.
  But, the majority of MariaDB executable programs also use MY_INIT(argv[0])
  rather than my_init(). So let's make MCS do like the other programs do.
2021-05-25 12:34:36 +04:00
Alexander Barkov
765858bc5b MCOL-4498 LIKE is not collation aware 2021-03-22 20:42:01 +04:00
benthompson15
afa88866bb MCOL-4483: Fix and consolidate log files and cpimport logging. 2021-02-12 15:40:16 -06:00
Alexander Barkov
69da915160 MCOL-4531 New string-to-decimal conversion implementation
This change fixes:

MCOL-4462 CAST(varchar_expr AS DECIMAL(M,N)) returns a wrong result
MCOL-4500 Bit functions processing throws internally trying to cast char into decimal representation
MCOL-4532 CAST(AS DECIMAL) returns a garbage for large values

Also, this change makes string-to-decimal conversion 5-10 times faster,
depending on exact data.
Performance implemenent is achieved by the fact that (unlike in the old
implementation), the new version does not do any "string" object copying.
2021-02-09 13:02:27 +04:00
Roman Nozdrin
5fce19df0a MCOL-4412 Introduce TypeHandler::getEmptyValueForType to return const ptr for an empty value
WE changes for SQL DML and DDL operations

Changes for bulk operations

Changes for scanning operations

Cleanup
2021-01-18 12:30:17 +00:00