1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-06-13 16:01:32 +03:00
Commit Graph

76 Commits

Author SHA1 Message Date
a0bee173f6 chore(build): fixes to satisfy clang19 warnings 2025-05-15 19:05:38 +04:00
b53c231ca6 MCOL-271 empty strings should not be NULLs (#2794)
This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.

Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
2023-03-30 21:18:29 +03:00
56f2346083 Remove windows ifdefs 2023-03-02 15:59:42 +00:00
878a8ab857 Compilation error fixes for the recent updates in container images 2022-10-14 17:53:48 +03:00
7d76dc4534 AUX column scan(MCOL-5021) effectively disables vectorized scanning on
ARM platforms. This patch resolves this issue and unifies AUX column
processing at x86 and ARM using tempate class SimdProcessor.
The patch also replaces uint16_t mask previously used in column.cpp and
SimProcessor code with a native masks that platform uses, e.g. __m128i
or __m128 on x86 and variety of masks on ARM.
To unify the processing I introduced a new filtering Compare Operator - COMPARE_NULLEQ.
with a 'c1 IS NULL semantics'.
2022-10-07 10:32:54 +00:00
6a6fee5969 MCOL-5021 Followup.
Allow the compiler to inline the call to nextColValue() in column.cpp.
2022-08-18 19:35:35 +00:00
cbfdae3481 MCOL-5021 Code changes based on review feedback. 2022-08-05 14:40:50 -04:00
262cd5c501 MCOL-5021 Remove hard-coded values for data type, column width
and compression type for the AUX column, and replace them with
constants defined in the execplan namespace.
2022-08-05 14:40:49 -04:00
c8b6b154bf MCOL-5021 Add an option in Columnstore.xml, fastdelete (disabled
by default), which when enabled, indiscriminately invalidates all
column extents and performs the actual DELETE only on the AUX
column. The trade-off with this approach would now be that the
first SELECT for certain query patterns (those containing a WHERE
predicate) after the DELETE operation will slow down as the
invalidated column extent would need to be scanned again to set
the min/max values.
2022-08-05 14:40:49 -04:00
2280b1dd25 MCOL-5021 Add support for the AUX column in ExeMgr and PrimProc.
In the joblist code, in addition to sending the lbid of the SCAN
column, we also send the corresponding lbid of the AUX column to PrimProc.

In the primitives processor code in PrimProc, we load the AUX column
block (8192 rows since the AUX column is implemented as a 1-byte
UNSIGNED TINYINT) into memory and then pass it down to the low-level
scanning (vectorized scanning as applicable) routine to build a non-Empty
mask for the block being processed to filter out DELETED rows based on
comparison of the AUX block row to the empty magic value for the AUX column.
2022-08-05 14:40:49 -04:00
c3a5731890 Rename cmpGt2 2022-08-04 16:16:38 +03:00
24b2c1c283 Vectorizing min/max for KIND_TEXT 2022-08-04 16:16:38 +03:00
231930b71d update 2022-08-04 16:16:38 +03:00
19ca844cd1 support_max_min 2022-08-04 16:16:38 +03:00
bcb89e00f4 Remove include 2022-08-04 16:16:38 +03:00
589b786fda Don't ignore null or empty in calculation 2022-08-04 16:16:38 +03:00
5c6cd2cca3 use vect update for everything except TEXT 2022-08-04 16:16:38 +03:00
20f48fd730 Vectorized update min max 2022-08-04 16:16:38 +03:00
f88a3bfc65 Remove include 2022-08-04 16:16:38 +03:00
b8200acd3b Don't ignore null or empty in calculation 2022-08-04 16:16:38 +03:00
c4df7925d1 use vect update for everything except TEXT 2022-08-04 16:16:38 +03:00
9930d0dedd Vectorized update min max 2022-08-04 16:16:38 +03:00
7cdc914b4e MCOL-4809 This patch introduces vectorized scanning/filtering for short CHAR/VARCHAR columns
Short CHAR/VARCHAR column values contain integer-encoded strings.
    After certain manipulations(orderSwap(strnxfrm(str))) the values
    become integers that preserve original strings order relation
    according to a certain translation rules(collation). Prepared
    values are ready to be SIMD-processed.
2022-04-01 10:28:33 +00:00
29679e91ec Clang warnfixes (#2310) 2022-03-21 13:19:55 -05:00
53b9a2a0f9 MCOL-4580 extent elimination for dictionary-based text/varchar types
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.

The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.

The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.
2022-03-02 23:53:39 +03:00
86e495ae2f MCOL-4809 This fixes Centos crash caused by combination of ::reserve() + vector index access 2022-02-23 12:38:22 +00:00
3919c541ac New warnfixes (#2254)
* Fix clang warnings

* Remove vim tab guides

* initialize variables

* 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length

* Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison

* chars are unsigned on ARM, having  if (ival < 0) always false

* chars are unsigned by default on ARM and comparison with -1 if always true
2022-02-17 13:08:58 +03:00
c79dfc4925 MCOL-4809 This patch adds support for float data types filtering and scanning vectorization 2022-02-03 16:38:56 +00:00
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
7b5845a4aa MCOL-4871 Bar's patch to do proper extent elimination for short CHAR 2021-12-17 17:41:03 +00:00
54a5623569 MCOL-4809 Review suggestions patch 2021-12-10 10:30:08 +00:00
af36f9940f This patch introduces support for scanning/filtering vectorized execution for numeric-based
data types TEXT, CHAR, VARCHAR, FLOAT and DOUBLE are not yet supported by vectorized path
This patch introduces an example for Google benchmarking suite to measure a perf diff
b/w legacy scan/filtering code and the templated version
2021-12-10 10:30:00 +00:00
3de038c1da MCOL-4876 This patch enables continues buffer to be used by ColumnCommand and aligns BPP::blockData
that in most cases was unaligned
2021-10-06 09:23:40 +00:00
67c85dae15 MCOL-4809 The patch replaces legacy scanning/filtering code with a number of templates that
simplifies control flow removing needless expressions
2021-09-06 17:04:52 +00:00
5c5f103f98 MCOL-4839: Fix clang build (#2100)
* Fix clang build

* Extern C returned to plugin_instance

Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>
2021-08-23 10:45:10 -05:00
b3a560300c Revert "Merge pull request #2022 from mariadb-corporation/bar-develop-MCOL-4791"
This reverts commit 4016e25e5b, reversing
changes made to 85435f6b1e.
2021-07-13 11:06:56 +00:00
8520f87237 MCOL-641 Cleanup. 2021-07-06 09:01:49 +00:00
e8126bede5 MCOL-4791 Fix ColumnCommand fudged data type format to clearly identify CHAR vs VARCHAR 2021-07-02 12:42:03 +04:00
284fc51bb7 MCOL-4726 Wrong result of WHERE char1_col='A' 2021-05-21 14:40:16 +04:00
765858bc5b MCOL-4498 LIKE is not collation aware 2021-03-22 20:42:01 +04:00
5bcc1cd1f0 A join patch for MCOL-4527 (a performance hack) and MCOL-4539 (a bug fix)
- MCOL-4527 Simple query performace is degraded between 5.4 and 5.5

  xxx_nopad_bin collations are now around 30% faster on simple queries like:

    SELECT * FROM t1 WHERE short_char_column_nopad_bin = 'literal'

  The gain is achieved by comparing two short CHAR values as uint64_t.

  Note, this patch does not affect xxx_bin collations!
  It wouldn't be correct to apply the same improvement for xxx_bin
  collations (i.e. with PAD SPACE attribute), because it would change
  the way how trailing spaces are compared.

- MCOL-4539 WHERE short_char_column='literal' ignores the collation on a huge table

  Only the first thread used a correct collation when performing:
    WHERE short_char_char='literal'
  Other (15) threads used the server default collation, because
  the charsetNumber attribute was not copyed during cloning.

- This patch also adds mtr/basic/suite.opt, so "mtr" can run without --extern.
2021-02-16 18:45:18 +04:00
a91fb15b07 Add PrimProc support for selective block loading for 16-byte columns. 2020-12-11 14:23:45 -05:00
52c5af054a Part#2 MCOL-495 Make string comparison not case sensitive
Fixing field='str' for short (non-Dict) CHAR and VARCHAR data types.
2020-12-04 08:40:29 +04:00
c5d4a918ee MCOL-4188 Regression fixes for MCOL-641.
1. In TupleAggregateStep::configDeliveredRowGroup(), use
jobInfo.projectionCols instead of jobInfo.nonConstCols
for setting scale and precision if the source column is
wide decimal.

2. Tighten rules for wide decimal processing. Specifically:
  a. Replace (precision > INT64MAXPRECISION) checks with
     (precision > INT64MAXPRECISION && precision <= INT128MAXPRECISION)
  b. At places where (colWidth == MAXDECIMALWIDTH) is not enough to
     determine if a column is wide decimal or not, also add a check on
     type being DECIMAL/UDECIMAL.
2020-11-24 20:15:33 -05:00
15b1bfa709 Fix fallthrough compilation warnings 2020-11-18 13:53:15 +00:00
3eb26c0d4a MCOL-4313 Introduced TSInt128 that is a storage class for int128
Removed uint128 from joblist/lbidlist.*

Another toString() method for wide-decimal that is EMPTY/NULL aware

Unified decimal processing in WF functions

Fixed a potential issue in EqualCompData::operator() for
    wide-decimal processing

Fixed some signedness warnings
2020-11-18 13:53:15 +00:00
1588ebe439 MCOL-641 Clean up primitives code
Add int128_t support into ByteStream

Fixed UTs broken after collation patch
2020-11-18 13:52:19 +00:00
d3bc68b02f MCOL-641 Refactor initial extent elimination support.
This commit also adds support in TupleHashJoinStep::forwardCPData,
although we currently do not support wide decimals as join keys.

Row estimation to determine large-side of the join is also updated.
2020-11-18 13:52:19 +00:00
74b64eb4f1 MCOL-641 1. Add support for int128_t in ParsedColumnFilter.
2. Set Decimal precision in SimpleColumn::evaluate().
3. Add support for int128_t in ConstantColumn.
4. Set IDB_Decimal::s128Value in buildDecimalColumn().
5. Use width 16 as first if predicate for branching based on decimal width.
2020-11-18 13:47:45 +00:00
824615a55b MCOL-641 Refactor empty value implementation in writeengine. 2020-11-18 13:47:44 +00:00