1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-08-05 16:15:50 +03:00
Commit Graph

329 Commits

Author SHA1 Message Date
Roman Nozdrin
e174696351 MCOL-5001 This patch merges ExeMgr and PrimProc runtimes
EM and PP are most resource-hungry runtimes.
        The merge enables to control their cummulative
        resource consumption, thread allocation + enables
        zero-copy data exchange b/w local EM and PP facilities.
2022-04-04 11:46:33 +00:00
Roman Nozdrin
7cdc914b4e MCOL-4809 This patch introduces vectorized scanning/filtering for short CHAR/VARCHAR columns
Short CHAR/VARCHAR column values contain integer-encoded strings.
    After certain manipulations(orderSwap(strnxfrm(str))) the values
    become integers that preserve original strings order relation
    according to a certain translation rules(collation). Prepared
    values are ready to be SIMD-processed.
2022-04-01 10:28:33 +00:00
Leonid Fedorov
65252df4f6 C++20 fixes 2022-03-28 12:32:29 +00:00
Leonid Fedorov
29679e91ec Clang warnfixes (#2310) 2022-03-21 13:19:55 -05:00
David Hall
a98834e31c MCOL-4841 Fix for MCOL-5009
respondWait could be set to false
while other threads were waiting. With respondWait false, okToRrespond
wouldn't ever get notify_one(). Get rid of respondWait and use
fProcessorPool->blockedThreadCount to determine if any threads may be
waiting.
2022-03-07 15:25:00 -06:00
Serguey Zefirov
53b9a2a0f9 MCOL-4580 extent elimination for dictionary-based text/varchar types
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.

The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.

The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.
2022-03-02 23:53:39 +03:00
Roman Nozdrin
86e495ae2f MCOL-4809 This fixes Centos crash caused by combination of ::reserve() + vector index access 2022-02-23 12:38:22 +00:00
Leonid Fedorov
3919c541ac New warnfixes (#2254)
* Fix clang warnings

* Remove vim tab guides

* initialize variables

* 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length

* Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison

* chars are unsigned on ARM, having  if (ival < 0) always false

* chars are unsigned by default on ARM and comparison with -1 if always true
2022-02-17 13:08:58 +03:00
David Hall
27dea733c5 MCOL4841 dev port run large join without OOM 2022-02-09 17:33:55 -06:00
Roman Nozdrin
c79dfc4925 MCOL-4809 This patch adds support for float data types filtering and scanning vectorization 2022-02-03 16:38:56 +00:00
Leonid Fedorov
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
Leonid Fedorov
5af2e70712 circular header fixe 2022-01-21 15:24:58 +00:00
Leonid Fedorov
01f3ceb437 replace header guards with #pragma once 2022-01-21 15:24:58 +00:00
Roman Nozdrin
05897948e4 MCOL-4899 MCS now applies a correct collation running IN for character data types 2022-01-05 12:00:01 +00:00
Roman Nozdrin
7b5845a4aa MCOL-4871 Bar's patch to do proper extent elimination for short CHAR 2021-12-17 17:41:03 +00:00
Roman Nozdrin
54a5623569 MCOL-4809 Review suggestions patch 2021-12-10 10:30:08 +00:00
Roman Nozdrin
af36f9940f This patch introduces support for scanning/filtering vectorized execution for numeric-based
data types TEXT, CHAR, VARCHAR, FLOAT and DOUBLE are not yet supported by vectorized path
This patch introduces an example for Google benchmarking suite to measure a perf diff
b/w legacy scan/filtering code and the templated version
2021-12-10 10:30:00 +00:00
Leonid Fedorov
1973168e03 c++17 fix 2021-10-29 14:57:11 +00:00
Roman Nozdrin
3de038c1da MCOL-4876 This patch enables continues buffer to be used by ColumnCommand and aligns BPP::blockData
that in most cases was unaligned
2021-10-06 09:23:40 +00:00
Roman Nozdrin
67c85dae15 MCOL-4809 The patch replaces legacy scanning/filtering code with a number of templates that
simplifies control flow removing needless expressions
2021-09-06 17:04:52 +00:00
Leonid Fedorov
5c5f103f98 MCOL-4839: Fix clang build (#2100)
* Fix clang build

* Extern C returned to plugin_instance

Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>
2021-08-23 10:45:10 -05:00
Alexander Barkov
c16b0f6ad7 MCOL-4823 WHERE char_col<varchar_col returns a wrong result of a large table (#2060)
SCommand StrFilterCmd::duplicate() missed these two lines:

    filterCmd->leftColType = leftColType;
    filterCmd->rightColType = rightColType;

which exist in the parent's FilterCommand::duplicate().

Rewriting the code to avoid duplication by using more inherited
methods/constructors. This reduces the probability of similar bugs
in the future.
2021-08-03 11:53:05 +03:00
Roman Nozdrin
a292585b8c MCOL-4815 ColumnCommand was replaced with a set of derived classes specified by
column width

RTSCommand was modified to use a fabric that produces CC class based on column width

NB this patch doesn't affect PseudoCC that also leverages ColumnCommand
2021-07-21 12:54:14 +00:00
Gagan Goel
b3a560300c Revert "Merge pull request #2022 from mariadb-corporation/bar-develop-MCOL-4791"
This reverts commit 4016e25e5b, reversing
changes made to 85435f6b1e.
2021-07-13 11:06:56 +00:00
Gagan Goel
90e5218c71 Revert "Merge pull request #2027 from mariadb-corporation/bar-develop-MCOL-4791"
This reverts commit 643c06b7fe, reversing
changes made to c679b11ef6.
2021-07-13 10:56:23 +00:00
Roman Nozdrin
866dc25729 Merge pull request #1842 from denis0x0D/MCOL-987_LZ
MCOL-987 LZ4 compression support.
2021-07-07 13:13:18 +03:00
Roman Nozdrin
7b4f759592 Merge pull request #2032 from drrtuy/MCOL-4802
MCOL-4802 Removed ByteStream methods for bool and add some logging in…
2021-07-07 13:03:54 +03:00
Roman Nozdrin
fb5ba84212 MCOL-4802 Removed ByteStream methods for bool manipulations and add some logging into I_S.columnstore_files 2021-07-07 07:16:30 +00:00
Alexander Barkov
9794f24369 MCOL-4801 Replace Row methods getStringLength() and getStringPointer() to getConstString() 2021-07-06 21:15:32 +04:00
Denis Khalikov
cc1c3629c5 MCOL-987 Add LZ4 compression.
* Adds CompressInterfaceLZ4 which uses LZ4 API for compress/uncompress.
* Adds CMake machinery to search LZ4 on running host.
* All methods which use static data and do not modify any internal data - become `static`,
  so we can use them without creation of the specific object. This is possible, because
  the header specification has not been modified. We still use 2 sections in header, first
  one with file meta data, the second one with pointers for compressed chunks.
* Methods `compress`, `uncompress`, `maxCompressedSize`, `getUncompressedSize` - become
  pure virtual, so we can override them for the other compression algos.
* Adds method `getChunkMagicNumber`, so we can verify chunk magic number
  for each compression algo.
* Renames "s/IDBCompressInterface/CompressInterface/g" according to requirement.
2021-07-06 18:04:37 +03:00
Gagan Goel
8520f87237 MCOL-641 Cleanup. 2021-07-06 09:01:49 +00:00
Alexander Barkov
46ee6b3968 A clean-up for MCOL-4791 Fix ColumnCommand fudged data type format to clearly identify CHAR vs VARCHAR
The "mIsDict" member was not copied in ColumnCommand::duplicate.
Thus store() erroneously entered colCompare() for dictionary commands.
2021-07-04 09:48:33 +04:00
Alexander Barkov
e8126bede5 MCOL-4791 Fix ColumnCommand fudged data type format to clearly identify CHAR vs VARCHAR 2021-07-02 12:42:03 +04:00
Roman Nozdrin
2de4888899 Merge pull request #1990 from drrtuy/MCOL-4173_9
MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI…
2021-06-24 16:15:07 +03:00
Roman Nozdrin
6620d873fd Merge pull request #1927 from denis0x0D/MCOL-4407
MCOL-4407 and condtion does not work when HWM > columnstore_string_san_threshold - 1
2021-06-24 15:58:35 +03:00
Roman Nozdrin
bed0b7c6bc MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI, functional JOINs
based on top of TypelessData
2021-06-24 08:07:23 +00:00
Denis Khalikov
8dd2f2937c MCOL-4407 and condtion does not work when HWM > columnstore_string_scan_threshold - 1 2021-06-21 14:04:26 +03:00
Alexander Barkov
b3d6f62964 MCOL-4753 Performance problem in Typeless join 2021-06-10 09:26:26 +00:00
Roman Nozdrin
42e710f817 Merge pull request #1942 from mariadb-corporation/bar-develop-compile-10.6
Fixing 10.6 + develop compilation failure
2021-05-25 14:31:37 +03:00
Alexander Barkov
9608533d92 MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop
mcsconfig.h and my_config.h have the following
pre-processor definitions:

1. Conflicting definitions coming from the standard cmake definitions:
- PACKAGE
- PACKAGE_BUGREPORT
- PACKAGE_NAME
- PACKAGE_STRING
- PACKAGE_TARNAME
- PACKAGE_VERSION
- VERSION

2. Conflicting definitions of other kinds:
- HAVE_STRTOLL - this is a dirt in MariaDB headers.
  Should be fixed in the server code. my_config.h erroneously
  performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1".
  in some cases. The former is not CMake compatible style. The latter is.

3. Non-conflicting definitions:
  Otherwise, mcsconfig.h and my_config.h should be mutually compatible,
  because both are generated by cmake on the same host machine. So
  they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc.

Observations:
- It's OK to include both mcsconfig.h and my_config.h providing that we
  suppress duplicate definition of the above conflicting types #1 and #2.
- There is no a need to suppress duplicate definitions mentioned in #3,
  as they are compatible!
- my_sys.h and m_ctype.h must always follow a CMake configuation header,
  either my_config.h or mcsconfig.h (or both).
  They must never be included without any preceeding configuration header.

This change make sure that we resolve conflicts by:
- either disallowing inclusion of mcsconfig.h and my_config.h
  at the same time
- or by hiding conflicting definitions #1 and #2
  (with their later restoring).
- also, by making sure that my_sys.h and m_ctype.h always follow
  a CMake configuration file.

Details:
- idb_mysql.h can now only be included only after my_config.h
  An attempt to use idb_mysql.h with mcsconfig.h instead of
  my_config.h is caught by the "#error" preprocessor directive.

- mariadb_my_sys.h can now be only included after mcsconfig.h.
  An attempt to use mariadb_my_sys.h without mcscofig.h
  (e.g. with my_config.h) is also caught by "#error".

- collation.h now can now be included in two ways.
  It now has the following effective structure:

    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    //  Remember current conflicting definitions on the preprocessor stack
    //  Undefine current conflicting definitions
    #endif
    #include "mcsconfig.h"
    #include "m_ctype.h"
    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    #    Restore conflicting definitions from the preprocessor stack
    #endif

  and can be included as follows:

  a. using only mcsconfig.h as a configuration header:

    // my_config.h must not be included so far
    #include "collation.h"

  b. using my_config.h as the first included configuration file:

    #define PREFER_MY_CONFIG_H // Force conflict resolution
    #include "my_config.h"     // can be included directly or indirectly
    ...
    #include "collation.h"

Other changes:

- Adding helper header files
     utils/common/mcsconfig_conflicting_defs_remember.h
     utils/common/mcsconfig_conflicting_defs_restore.h
     utils/common/mcsconfig_conflicting_defs_undef.h
  to perform conflict resolution easier.

- Removing `#include "collation.h"` from a number of files,
  as it's automatically included from rowgroup.h.

- Removing redundant `#include "utils_utf8.h"`.
  This change is not directly related to the problem being fixed,
  but it's nice to remove redundant directives for both collation.h
  and utils_utf8.h from all the files that do not really need them.
  (this change could probably have gone as a separate commit)

- Changing my_init() to MY_INIT(argv[0]) in the MCS services sources.
  After the fix of the complitation failure it appeared that ColumnStore
  services compiled with the debug build crash due to recent changes in
  safemalloc. The crash happened in strcmp() with `my_progname` as an argument
  (where my_progname is a mysys global variable). This problem should
  probably be fixed on the server side as well to avoid passing NULL.
  But, the majority of MariaDB executable programs also use MY_INIT(argv[0])
  rather than my_init(). So let's make MCS do like the other programs do.
2021-05-25 12:34:36 +04:00
Alexander Barkov
284fc51bb7 MCOL-4726 Wrong result of WHERE char1_col='A' 2021-05-21 14:40:16 +04:00
Roman Nozdrin
757f8d00a5 A plugable PoorManProfiler singleton 2021-04-14 10:54:46 +00:00
Gagan Goel
3ed1b26a2a Merge pull request #1856 from mariadb-corporation/bar-develop-MCOL-4361
MCOL-4361 Replace pow(10.0, (double)scale) expressions with a static …
2021-04-13 07:01:33 -04:00
Alexander Barkov
362bfcd15e MCOL-4361 Replace pow(10.0, (double)scale) expressions with a static dictionary lookup. 2021-04-09 12:41:04 +04:00
Roman Nozdrin
895cbbe2d1 This patch revives PP poorman's profiling using StopWatch class 2021-04-08 12:10:06 +00:00
Alexander Barkov
765858bc5b MCOL-4498 LIKE is not collation aware 2021-03-22 20:42:01 +04:00
David.Hall
b35e1ee395 Merge pull request #1769 from mariadb-corporation/bar-develop-MCOL-4527
A join patch for MCOL-4527 (a performance hack) and MCOL-4539 (a bug …
2021-02-17 13:31:35 -06:00
Alexander Barkov
5bcc1cd1f0 A join patch for MCOL-4527 (a performance hack) and MCOL-4539 (a bug fix)
- MCOL-4527 Simple query performace is degraded between 5.4 and 5.5

  xxx_nopad_bin collations are now around 30% faster on simple queries like:

    SELECT * FROM t1 WHERE short_char_column_nopad_bin = 'literal'

  The gain is achieved by comparing two short CHAR values as uint64_t.

  Note, this patch does not affect xxx_bin collations!
  It wouldn't be correct to apply the same improvement for xxx_bin
  collations (i.e. with PAD SPACE attribute), because it would change
  the way how trailing spaces are compared.

- MCOL-4539 WHERE short_char_column='literal' ignores the collation on a huge table

  Only the first thread used a correct collation when performing:
    WHERE short_char_char='literal'
  Other (15) threads used the server default collation, because
  the charsetNumber attribute was not copyed during cloning.

- This patch also adds mtr/basic/suite.opt, so "mtr" can run without --extern.
2021-02-16 18:45:18 +04:00
benthompson15
afa88866bb MCOL-4483: Fix and consolidate log files and cpimport logging. 2021-02-12 15:40:16 -06:00
benthompson15
846f7fb29b MCOL-4193: Delete unused OAM and applications, ProcMon, ProcMgr, and no longer build all tools for packages 2021-02-08 17:51:09 -06:00