1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-04-21 19:45:56 +03:00

75 Commits

Author SHA1 Message Date
Sergey Zefirov
b53c231ca6 MCOL-271 empty strings should not be NULLs (#2794)
This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.

Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
2023-03-30 21:18:29 +03:00
Leonid Fedorov
56f2346083 Remove windows ifdefs 2023-03-02 15:59:42 +00:00
Roman Nozdrin
878a8ab857 Compilation error fixes for the recent updates in container images 2022-10-14 17:53:48 +03:00
NTH19
7d76dc4534 AUX column scan(MCOL-5021) effectively disables vectorized scanning on
ARM platforms. This patch resolves this issue and unifies AUX column
processing at x86 and ARM using tempate class SimdProcessor.
The patch also replaces uint16_t mask previously used in column.cpp and
SimProcessor code with a native masks that platform uses, e.g. __m128i
or __m128 on x86 and variety of masks on ARM.
To unify the processing I introduced a new filtering Compare Operator - COMPARE_NULLEQ.
with a 'c1 IS NULL semantics'.
2022-10-07 10:32:54 +00:00
Gagan Goel
6a6fee5969 MCOL-5021 Followup.
Allow the compiler to inline the call to nextColValue() in column.cpp.
2022-08-18 19:35:35 +00:00
Gagan Goel
cbfdae3481 MCOL-5021 Code changes based on review feedback. 2022-08-05 14:40:50 -04:00
Gagan Goel
262cd5c501 MCOL-5021 Remove hard-coded values for data type, column width
and compression type for the AUX column, and replace them with
constants defined in the execplan namespace.
2022-08-05 14:40:49 -04:00
Gagan Goel
c8b6b154bf MCOL-5021 Add an option in Columnstore.xml, fastdelete (disabled
by default), which when enabled, indiscriminately invalidates all
column extents and performs the actual DELETE only on the AUX
column. The trade-off with this approach would now be that the
first SELECT for certain query patterns (those containing a WHERE
predicate) after the DELETE operation will slow down as the
invalidated column extent would need to be scanned again to set
the min/max values.
2022-08-05 14:40:49 -04:00
Gagan Goel
2280b1dd25 MCOL-5021 Add support for the AUX column in ExeMgr and PrimProc.
In the joblist code, in addition to sending the lbid of the SCAN
column, we also send the corresponding lbid of the AUX column to PrimProc.

In the primitives processor code in PrimProc, we load the AUX column
block (8192 rows since the AUX column is implemented as a 1-byte
UNSIGNED TINYINT) into memory and then pass it down to the low-level
scanning (vectorized scanning as applicable) routine to build a non-Empty
mask for the block being processed to filter out DELETED rows based on
comparison of the AUX block row to the empty magic value for the AUX column.
2022-08-05 14:40:49 -04:00
Andrey Piskunov
c3a5731890 Rename cmpGt2 2022-08-04 16:16:38 +03:00
Andrey Piskunov
24b2c1c283 Vectorizing min/max for KIND_TEXT 2022-08-04 16:16:38 +03:00
NTH19
231930b71d update 2022-08-04 16:16:38 +03:00
NTH19
19ca844cd1 support_max_min 2022-08-04 16:16:38 +03:00
Andrey Piskunov
bcb89e00f4 Remove include 2022-08-04 16:16:38 +03:00
Andrey Piskunov
589b786fda Don't ignore null or empty in calculation 2022-08-04 16:16:38 +03:00
Andrey Piskunov
5c6cd2cca3 use vect update for everything except TEXT 2022-08-04 16:16:38 +03:00
Andrey Piskunov
20f48fd730 Vectorized update min max 2022-08-04 16:16:38 +03:00
Andrey Piskunov
f88a3bfc65 Remove include 2022-08-04 16:16:38 +03:00
Andrey Piskunov
b8200acd3b Don't ignore null or empty in calculation 2022-08-04 16:16:38 +03:00
Andrey Piskunov
c4df7925d1 use vect update for everything except TEXT 2022-08-04 16:16:38 +03:00
Andrey Piskunov
9930d0dedd Vectorized update min max 2022-08-04 16:16:38 +03:00
Roman Nozdrin
7cdc914b4e MCOL-4809 This patch introduces vectorized scanning/filtering for short CHAR/VARCHAR columns
Short CHAR/VARCHAR column values contain integer-encoded strings.
    After certain manipulations(orderSwap(strnxfrm(str))) the values
    become integers that preserve original strings order relation
    according to a certain translation rules(collation). Prepared
    values are ready to be SIMD-processed.
2022-04-01 10:28:33 +00:00
Leonid Fedorov
29679e91ec
Clang warnfixes (#2310) 2022-03-21 13:19:55 -05:00
Serguey Zefirov
53b9a2a0f9 MCOL-4580 extent elimination for dictionary-based text/varchar types
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.

The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.

The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.
2022-03-02 23:53:39 +03:00
Roman Nozdrin
86e495ae2f MCOL-4809 This fixes Centos crash caused by combination of ::reserve() + vector index access 2022-02-23 12:38:22 +00:00
Leonid Fedorov
3919c541ac
New warnfixes (#2254)
* Fix clang warnings

* Remove vim tab guides

* initialize variables

* 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length

* Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison

* chars are unsigned on ARM, having  if (ival < 0) always false

* chars are unsigned by default on ARM and comparison with -1 if always true
2022-02-17 13:08:58 +03:00
Roman Nozdrin
c79dfc4925 MCOL-4809 This patch adds support for float data types filtering and scanning vectorization 2022-02-03 16:38:56 +00:00
Leonid Fedorov
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
Roman Nozdrin
7b5845a4aa MCOL-4871 Bar's patch to do proper extent elimination for short CHAR 2021-12-17 17:41:03 +00:00
Roman Nozdrin
54a5623569 MCOL-4809 Review suggestions patch 2021-12-10 10:30:08 +00:00
Roman Nozdrin
af36f9940f This patch introduces support for scanning/filtering vectorized execution for numeric-based
data types TEXT, CHAR, VARCHAR, FLOAT and DOUBLE are not yet supported by vectorized path
This patch introduces an example for Google benchmarking suite to measure a perf diff
b/w legacy scan/filtering code and the templated version
2021-12-10 10:30:00 +00:00
Roman Nozdrin
3de038c1da MCOL-4876 This patch enables continues buffer to be used by ColumnCommand and aligns BPP::blockData
that in most cases was unaligned
2021-10-06 09:23:40 +00:00
Roman Nozdrin
67c85dae15 MCOL-4809 The patch replaces legacy scanning/filtering code with a number of templates that
simplifies control flow removing needless expressions
2021-09-06 17:04:52 +00:00
Leonid Fedorov
5c5f103f98
MCOL-4839: Fix clang build (#2100)
* Fix clang build

* Extern C returned to plugin_instance

Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>
2021-08-23 10:45:10 -05:00
Gagan Goel
b3a560300c Revert "Merge pull request #2022 from mariadb-corporation/bar-develop-MCOL-4791"
This reverts commit 4016e25e5ba4f727e19180b62ef405d2f5ef9c1e, reversing
changes made to 85435f6b1e733ce1e63717c2ad7cfc963b82f343.
2021-07-13 11:06:56 +00:00
Gagan Goel
8520f87237 MCOL-641 Cleanup. 2021-07-06 09:01:49 +00:00
Alexander Barkov
e8126bede5 MCOL-4791 Fix ColumnCommand fudged data type format to clearly identify CHAR vs VARCHAR 2021-07-02 12:42:03 +04:00
Alexander Barkov
284fc51bb7 MCOL-4726 Wrong result of WHERE char1_col='A' 2021-05-21 14:40:16 +04:00
Alexander Barkov
765858bc5b MCOL-4498 LIKE is not collation aware 2021-03-22 20:42:01 +04:00
Alexander Barkov
5bcc1cd1f0 A join patch for MCOL-4527 (a performance hack) and MCOL-4539 (a bug fix)
- MCOL-4527 Simple query performace is degraded between 5.4 and 5.5

  xxx_nopad_bin collations are now around 30% faster on simple queries like:

    SELECT * FROM t1 WHERE short_char_column_nopad_bin = 'literal'

  The gain is achieved by comparing two short CHAR values as uint64_t.

  Note, this patch does not affect xxx_bin collations!
  It wouldn't be correct to apply the same improvement for xxx_bin
  collations (i.e. with PAD SPACE attribute), because it would change
  the way how trailing spaces are compared.

- MCOL-4539 WHERE short_char_column='literal' ignores the collation on a huge table

  Only the first thread used a correct collation when performing:
    WHERE short_char_char='literal'
  Other (15) threads used the server default collation, because
  the charsetNumber attribute was not copyed during cloning.

- This patch also adds mtr/basic/suite.opt, so "mtr" can run without --extern.
2021-02-16 18:45:18 +04:00
Gagan Goel
a91fb15b07 Add PrimProc support for selective block loading for 16-byte columns. 2020-12-11 14:23:45 -05:00
Alexander Barkov
52c5af054a Part#2 MCOL-495 Make string comparison not case sensitive
Fixing field='str' for short (non-Dict) CHAR and VARCHAR data types.
2020-12-04 08:40:29 +04:00
Gagan Goel
c5d4a918ee MCOL-4188 Regression fixes for MCOL-641.
1. In TupleAggregateStep::configDeliveredRowGroup(), use
jobInfo.projectionCols instead of jobInfo.nonConstCols
for setting scale and precision if the source column is
wide decimal.

2. Tighten rules for wide decimal processing. Specifically:
  a. Replace (precision > INT64MAXPRECISION) checks with
     (precision > INT64MAXPRECISION && precision <= INT128MAXPRECISION)
  b. At places where (colWidth == MAXDECIMALWIDTH) is not enough to
     determine if a column is wide decimal or not, also add a check on
     type being DECIMAL/UDECIMAL.
2020-11-24 20:15:33 -05:00
Roman Nozdrin
15b1bfa709 Fix fallthrough compilation warnings 2020-11-18 13:53:15 +00:00
Roman Nozdrin
3eb26c0d4a MCOL-4313 Introduced TSInt128 that is a storage class for int128
Removed uint128 from joblist/lbidlist.*

Another toString() method for wide-decimal that is EMPTY/NULL aware

Unified decimal processing in WF functions

Fixed a potential issue in EqualCompData::operator() for
    wide-decimal processing

Fixed some signedness warnings
2020-11-18 13:53:15 +00:00
Roman Nozdrin
1588ebe439 MCOL-641 Clean up primitives code
Add int128_t support into ByteStream

Fixed UTs broken after collation patch
2020-11-18 13:52:19 +00:00
Gagan Goel
d3bc68b02f MCOL-641 Refactor initial extent elimination support.
This commit also adds support in TupleHashJoinStep::forwardCPData,
although we currently do not support wide decimals as join keys.

Row estimation to determine large-side of the join is also updated.
2020-11-18 13:52:19 +00:00
Gagan Goel
74b64eb4f1 MCOL-641 1. Add support for int128_t in ParsedColumnFilter.
2. Set Decimal precision in SimpleColumn::evaluate().
3. Add support for int128_t in ConstantColumn.
4. Set IDB_Decimal::s128Value in buildDecimalColumn().
5. Use width 16 as first if predicate for branching based on decimal width.
2020-11-18 13:47:45 +00:00
Gagan Goel
824615a55b MCOL-641 Refactor empty value implementation in writeengine. 2020-11-18 13:47:44 +00:00
Roman Nozdrin
97ee1609b2 MCOL-641 Replaced NULL binary constants.
DataConvert::decimalToString, toString, writeIntPart, writeFractionalPart are not templates anymore.
2020-11-18 13:47:44 +00:00