mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-11 18:21:49 +03:00

Author	SHA1	Message	Date
Sergey Zefirov	b53c231ca6	MCOL-271 empty strings should not be NULLs (#2794 ) This patch improves handling of NULLs in textual fields in ColumnStore. Previously empty strings were considered NULLs and it could be a problem if data scheme allows for empty strings. It was also one of major reasons of behavior difference between ColumnStore and other engines in MariaDB family. Also, this patch fixes some other bugs and incorrect behavior, for example, incorrect comparison for "column <= ''" which evaluates to constant True for all purposes before this patch.	2023-03-30 21:18:29 +03:00
Leonid Fedorov	56f2346083	Remove windows ifdefs	2023-03-02 15:59:42 +00:00
Roman Nozdrin	878a8ab857	Compilation error fixes for the recent updates in container images	2022-10-14 17:53:48 +03:00
NTH19	7d76dc4534	AUX column scan(MCOL-5021) effectively disables vectorized scanning on ARM platforms. This patch resolves this issue and unifies AUX column processing at x86 and ARM using tempate class SimdProcessor. The patch also replaces uint16_t mask previously used in column.cpp and SimProcessor code with a native masks that platform uses, e.g. __m128i or __m128 on x86 and variety of masks on ARM. To unify the processing I introduced a new filtering Compare Operator - COMPARE_NULLEQ. with a 'c1 IS NULL semantics'.	2022-10-07 10:32:54 +00:00
Gagan Goel	6a6fee5969	MCOL-5021 Followup. Allow the compiler to inline the call to nextColValue() in column.cpp.	2022-08-18 19:35:35 +00:00
Gagan Goel	cbfdae3481	MCOL-5021 Code changes based on review feedback.	2022-08-05 14:40:50 -04:00
Gagan Goel	262cd5c501	MCOL-5021 Remove hard-coded values for data type, column width and compression type for the AUX column, and replace them with constants defined in the execplan namespace.	2022-08-05 14:40:49 -04:00
Gagan Goel	c8b6b154bf	MCOL-5021 Add an option in Columnstore.xml, fastdelete (disabled by default), which when enabled, indiscriminately invalidates all column extents and performs the actual DELETE only on the AUX column. The trade-off with this approach would now be that the first SELECT for certain query patterns (those containing a WHERE predicate) after the DELETE operation will slow down as the invalidated column extent would need to be scanned again to set the min/max values.	2022-08-05 14:40:49 -04:00
Gagan Goel	2280b1dd25	MCOL-5021 Add support for the AUX column in ExeMgr and PrimProc. In the joblist code, in addition to sending the lbid of the SCAN column, we also send the corresponding lbid of the AUX column to PrimProc. In the primitives processor code in PrimProc, we load the AUX column block (8192 rows since the AUX column is implemented as a 1-byte UNSIGNED TINYINT) into memory and then pass it down to the low-level scanning (vectorized scanning as applicable) routine to build a non-Empty mask for the block being processed to filter out DELETED rows based on comparison of the AUX block row to the empty magic value for the AUX column.	2022-08-05 14:40:49 -04:00
Andrey Piskunov	c3a5731890	Rename cmpGt2	2022-08-04 16:16:38 +03:00
Andrey Piskunov	24b2c1c283	Vectorizing min/max for KIND_TEXT	2022-08-04 16:16:38 +03:00
NTH19	231930b71d	update	2022-08-04 16:16:38 +03:00
NTH19	19ca844cd1	support_max_min	2022-08-04 16:16:38 +03:00
Andrey Piskunov	bcb89e00f4	Remove include	2022-08-04 16:16:38 +03:00
Andrey Piskunov	589b786fda	Don't ignore null or empty in calculation	2022-08-04 16:16:38 +03:00
Andrey Piskunov	5c6cd2cca3	use vect update for everything except TEXT	2022-08-04 16:16:38 +03:00
Andrey Piskunov	20f48fd730	Vectorized update min max	2022-08-04 16:16:38 +03:00
Andrey Piskunov	f88a3bfc65	Remove include	2022-08-04 16:16:38 +03:00
Andrey Piskunov	b8200acd3b	Don't ignore null or empty in calculation	2022-08-04 16:16:38 +03:00
Andrey Piskunov	c4df7925d1	use vect update for everything except TEXT	2022-08-04 16:16:38 +03:00
Andrey Piskunov	9930d0dedd	Vectorized update min max	2022-08-04 16:16:38 +03:00
Roman Nozdrin	7cdc914b4e	MCOL-4809 This patch introduces vectorized scanning/filtering for short CHAR/VARCHAR columns Short CHAR/VARCHAR column values contain integer-encoded strings. After certain manipulations(orderSwap(strnxfrm(str))) the values become integers that preserve original strings order relation according to a certain translation rules(collation). Prepared values are ready to be SIMD-processed.	2022-04-01 10:28:33 +00:00
Leonid Fedorov	29679e91ec	Clang warnfixes (#2310 )	2022-03-21 13:19:55 -05:00
Serguey Zefirov	53b9a2a0f9	MCOL-4580 extent elimination for dictionary-based text/varchar types The idea is relatively simple - encode prefixes of collated strings as integers and use them to compute extents' ranges. Then we can eliminate extents with strings. The actual patch does have all the code there but miss one important step: we do not keep collation index, we keep charset index. Because of this, some of the tests in the bugfix suite fail and thus main functionality is turned off. The reason of this patch to be put into PR at all is that it contains changes that made CHAR/VARCHAR columns unsigned. This change is needed in vectorization work.	2022-03-02 23:53:39 +03:00
Roman Nozdrin	86e495ae2f	MCOL-4809 This fixes Centos crash caused by combination of ::reserve() + vector index access	2022-02-23 12:38:22 +00:00
Leonid Fedorov	3919c541ac	New warnfixes (#2254 ) * Fix clang warnings * Remove vim tab guides * initialize variables * 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length * Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison * chars are unsigned on ARM, having if (ival < 0) always false * chars are unsigned by default on ARM and comparison with -1 if always true	2022-02-17 13:08:58 +03:00
Roman Nozdrin	c79dfc4925	MCOL-4809 This patch adds support for float data types filtering and scanning vectorization	2022-02-03 16:38:56 +00:00
Leonid Fedorov	04752ec546	clang format apply	2022-01-21 16:43:49 +00:00
Roman Nozdrin	7b5845a4aa	MCOL-4871 Bar's patch to do proper extent elimination for short CHAR	2021-12-17 17:41:03 +00:00
Roman Nozdrin	54a5623569	MCOL-4809 Review suggestions patch	2021-12-10 10:30:08 +00:00
Roman Nozdrin	af36f9940f	This patch introduces support for scanning/filtering vectorized execution for numeric-based data types TEXT, CHAR, VARCHAR, FLOAT and DOUBLE are not yet supported by vectorized path This patch introduces an example for Google benchmarking suite to measure a perf diff b/w legacy scan/filtering code and the templated version	2021-12-10 10:30:00 +00:00
Roman Nozdrin	3de038c1da	MCOL-4876 This patch enables continues buffer to be used by ColumnCommand and aligns BPP::blockData that in most cases was unaligned	2021-10-06 09:23:40 +00:00
Roman Nozdrin	67c85dae15	MCOL-4809 The patch replaces legacy scanning/filtering code with a number of templates that simplifies control flow removing needless expressions	2021-09-06 17:04:52 +00:00
Leonid Fedorov	5c5f103f98	MCOL-4839: Fix clang build (#2100 ) * Fix clang build * Extern C returned to plugin_instance Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>	2021-08-23 10:45:10 -05:00
Gagan Goel	b3a560300c	Revert "Merge pull request #2022 from mariadb-corporation/bar-develop-MCOL-4791" This reverts commit `4016e25e5b`, reversing changes made to `85435f6b1e`.	2021-07-13 11:06:56 +00:00
Gagan Goel	8520f87237	MCOL-641 Cleanup.	2021-07-06 09:01:49 +00:00
Alexander Barkov	e8126bede5	MCOL-4791 Fix ColumnCommand fudged data type format to clearly identify CHAR vs VARCHAR	2021-07-02 12:42:03 +04:00
Alexander Barkov	284fc51bb7	MCOL-4726 Wrong result of WHERE char1_col='A'	2021-05-21 14:40:16 +04:00
Alexander Barkov	765858bc5b	MCOL-4498 LIKE is not collation aware	2021-03-22 20:42:01 +04:00
Alexander Barkov	5bcc1cd1f0	A join patch for MCOL-4527 (a performance hack) and MCOL-4539 (a bug fix) - MCOL-4527 Simple query performace is degraded between 5.4 and 5.5 xxx_nopad_bin collations are now around 30% faster on simple queries like: SELECT * FROM t1 WHERE short_char_column_nopad_bin = 'literal' The gain is achieved by comparing two short CHAR values as uint64_t. Note, this patch does not affect xxx_bin collations! It wouldn't be correct to apply the same improvement for xxx_bin collations (i.e. with PAD SPACE attribute), because it would change the way how trailing spaces are compared. - MCOL-4539 WHERE short_char_column='literal' ignores the collation on a huge table Only the first thread used a correct collation when performing: WHERE short_char_char='literal' Other (15) threads used the server default collation, because the charsetNumber attribute was not copyed during cloning. - This patch also adds mtr/basic/suite.opt, so "mtr" can run without --extern.	2021-02-16 18:45:18 +04:00
Gagan Goel	a91fb15b07	Add PrimProc support for selective block loading for 16-byte columns.	2020-12-11 14:23:45 -05:00
Alexander Barkov	52c5af054a	Part#2 MCOL-495 Make string comparison not case sensitive Fixing field='str' for short (non-Dict) CHAR and VARCHAR data types.	2020-12-04 08:40:29 +04:00
Gagan Goel	c5d4a918ee	MCOL-4188 Regression fixes for MCOL-641. 1. In TupleAggregateStep::configDeliveredRowGroup(), use jobInfo.projectionCols instead of jobInfo.nonConstCols for setting scale and precision if the source column is wide decimal. 2. Tighten rules for wide decimal processing. Specifically: a. Replace (precision > INT64MAXPRECISION) checks with (precision > INT64MAXPRECISION && precision <= INT128MAXPRECISION) b. At places where (colWidth == MAXDECIMALWIDTH) is not enough to determine if a column is wide decimal or not, also add a check on type being DECIMAL/UDECIMAL.	2020-11-24 20:15:33 -05:00
Roman Nozdrin	15b1bfa709	Fix fallthrough compilation warnings	2020-11-18 13:53:15 +00:00
Roman Nozdrin	3eb26c0d4a	MCOL-4313 Introduced TSInt128 that is a storage class for int128 Removed uint128 from joblist/lbidlist.* Another toString() method for wide-decimal that is EMPTY/NULL aware Unified decimal processing in WF functions Fixed a potential issue in EqualCompData::operator() for wide-decimal processing Fixed some signedness warnings	2020-11-18 13:53:15 +00:00
Roman Nozdrin	1588ebe439	MCOL-641 Clean up primitives code Add int128_t support into ByteStream Fixed UTs broken after collation patch	2020-11-18 13:52:19 +00:00
Gagan Goel	d3bc68b02f	MCOL-641 Refactor initial extent elimination support. This commit also adds support in TupleHashJoinStep::forwardCPData, although we currently do not support wide decimals as join keys. Row estimation to determine large-side of the join is also updated.	2020-11-18 13:52:19 +00:00
Gagan Goel	74b64eb4f1	MCOL-641 1. Add support for int128_t in ParsedColumnFilter. 2. Set Decimal precision in SimpleColumn::evaluate(). 3. Add support for int128_t in ConstantColumn. 4. Set IDB_Decimal::s128Value in buildDecimalColumn(). 5. Use width 16 as first if predicate for branching based on decimal width.	2020-11-18 13:47:45 +00:00
Gagan Goel	824615a55b	MCOL-641 Refactor empty value implementation in writeengine.	2020-11-18 13:47:44 +00:00
Roman Nozdrin	97ee1609b2	MCOL-641 Replaced NULL binary constants. DataConvert::decimalToString, toString, writeIntPart, writeFractionalPart are not templates anymore.	2020-11-18 13:47:44 +00:00

1 2

75 Commits