mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-10 07:20:58 +03:00

Author	SHA1	Message	Date
Serguey Zefirov	6e995e2e80	fix: MCOL-5755: incorrect handling of BLOB (and TEXT) in GROUP BY BLOB fields did not work as grouping keys at all, they were assigned value NULL for any value, be it NULL or not. The fix is in the rowaggregation.cpp in the initMapping(), a switch/case branch was added to handle BLOB field copying there. Also, TEXT columns did not distinguish between NULL and empty string in the grouping algorithm, now they do. The fix is in the equals() function, now we specifically check for isNull() equality between values.	2024-07-11 11:03:05 +03:00
Leonid Fedorov	86c1c5d537	fix(rgdata)!: Fix assertion failure leading to disk-based aggregation failure The new added invariant checking that RGData knows the number of columns and fixed size columns was failing for disk-based aggregation workloads, leading them to provide a wrong result. (The assertion failure happened in RGData::getRow(uint32_t num, Row* row) which is called in the finalization of sub-aggregation results, necessary for merging part results. As the merging failed, duplicate results were output for disk-based aggregation queries. The assertion failure was caused by RGData::deserialize(ByteStream& bs, uint32_t defAmount) not setting rowSize and colCount if necessary (e.g. when the deserialization happens into a new, default RGData, which doesn't know anything about its structure yet. This is the case when the default constructor for RGData() is used, which sets rowSize and columnCount to 0 each. There are three code parts that make use of the default RGData() ctor. The fix is for the use in RowGroupStorage::loadRG(uint64_t rgid, std::unique_ptr<RGData>& rgdata, bool unlinkDump = false), where the default RGData object is used to directly deserialize a ByteStream into it. The deserialize method now checks if both rowSize and columnCount are 0 and if yes sets the read values from the ByteStream for both. We should probably check the other two code parts making use of the default RGData ctor, too. This happens in joinpartition.cpp and tuplejoiner.cpp. --------- Co-authored-by: Theresa Hradilak <34538290+phoeinx@users.noreply.github.com>	2023-09-30 00:02:31 +03:00
Sergey Zefirov	920607520c	feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support Adds a special column which helps to differentiate data and rollups of various depts and a simple logic to row aggregation to add processing of subtotals.	2023-09-26 17:01:53 +03:00
Roman Nozdrin	4fe9cd64a3	Revert "No boost condition (#2822 )" (#2828 ) This reverts commit `f916e64927`.	2023-04-22 15:49:50 +03:00
Leonid Fedorov	f916e64927	No boost condition (#2822 ) This patch replaces boost primitives with stdlib counterparts.	2023-04-22 00:42:45 +03:00
Leonid Fedorov	c2d0fa24da	replace boost::shared_array<T> to std::shared_ptr<T[]>	2023-04-14 10:33:27 +00:00
Leonid Fedorov	a508b86091	remove boost/shared_array include	2023-04-14 09:42:50 +00:00
Leonid Fedorov	2e1394149b	MCOL-5464: Fixes of bugs from ASAN warnings, part one (#2792 ) * Fixes of bugs from ASAN warnings, part one * MQC as static library, with nifty counter for global map and mutex * Switch clang to 16 * link messageqcpp to execplan	2023-04-04 02:33:23 +03:00
Sergey Zefirov	b53c231ca6	MCOL-271 empty strings should not be NULLs (#2794 ) This patch improves handling of NULLs in textual fields in ColumnStore. Previously empty strings were considered NULLs and it could be a problem if data scheme allows for empty strings. It was also one of major reasons of behavior difference between ColumnStore and other engines in MariaDB family. Also, this patch fixes some other bugs and incorrect behavior, for example, incorrect comparison for "column <= ''" which evaluates to constant True for all purposes before this patch.	2023-03-30 21:18:29 +03:00
Roman Nozdrin	786b9da5b0	MCOL-5438 COUNT() in math causes SEGV	2023-03-09 20:35:38 +00:00
Leonid Fedorov	d2432f9bf6	get rid of pointers for 128 fields	2022-08-26 15:12:22 +00:00
mariadb-AndreyPiskunov	0863ecd279	Replace getBinaryField	2022-08-25 18:21:43 +03:00
Leonid Fedorov	fbd043b036	Fixing alightment for clang tests of rowgroup	2022-03-23 14:29:19 +00:00
Leonid Fedorov	3919c541ac	New warnfixes (#2254 ) * Fix clang warnings * Remove vim tab guides * initialize variables * 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length * Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison * chars are unsigned on ARM, having if (ival < 0) always false * chars are unsigned by default on ARM and comparison with -1 if always true	2022-02-17 13:08:58 +03:00
Leonid Fedorov	04752ec546	clang format apply	2022-01-21 16:43:49 +00:00
Denis Khalikov	7bda598fbf	MCOL-4810 Redundant copying and wasting memory in PrimProc This patch eliminates a copying `long string`s into the bytestream.	2021-08-26 12:16:23 +03:00
Alexander Barkov	9794f24369	MCOL-4801 Replace Row methods getStringLength() and getStringPointer() to getConstString()	2021-07-06 21:15:32 +04:00
Gagan Goel	8520f87237	MCOL-641 Cleanup.	2021-07-06 09:01:49 +00:00
Roman Nozdrin	bed0b7c6bc	MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI, functional JOINs based on top of TypelessData	2021-06-24 08:07:23 +00:00
Alexey Antipovsky	475104e4d3	[MCOL-4709] Disk-based aggregation * Introduce multigeneration aggregation * Do not save unused part of RGDatas to disk * Add IO error explanation (strerror) * Reduce memory usage while aggregating * introduce in-memory generations to better memory utilization * Try to limit the qty of buckets at a low limit * Refactor disk aggregation a bit * pass calculated hash into RowAggregation * try to keep some RGData with free space in memory * do not dump more than half of rowgroups to disk if generations are allowed, instead start a new generation * for each thread shift the first processed bucket at each iteration, so the generations start more evenly * Unify temp data location * Explicitly create temp subdirectories whether disk aggregation/join are enabled or not	2021-06-06 16:09:15 +03:00
Alexander Barkov	9608533d92	MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop mcsconfig.h and my_config.h have the following pre-processor definitions: 1. Conflicting definitions coming from the standard cmake definitions: - PACKAGE - PACKAGE_BUGREPORT - PACKAGE_NAME - PACKAGE_STRING - PACKAGE_TARNAME - PACKAGE_VERSION - VERSION 2. Conflicting definitions of other kinds: - HAVE_STRTOLL - this is a dirt in MariaDB headers. Should be fixed in the server code. my_config.h erroneously performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1". in some cases. The former is not CMake compatible style. The latter is. 3. Non-conflicting definitions: Otherwise, mcsconfig.h and my_config.h should be mutually compatible, because both are generated by cmake on the same host machine. So they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc. Observations: - It's OK to include both mcsconfig.h and my_config.h providing that we suppress duplicate definition of the above conflicting types #1 and #2. - There is no a need to suppress duplicate definitions mentioned in #3, as they are compatible! - my_sys.h and m_ctype.h must always follow a CMake configuation header, either my_config.h or mcsconfig.h (or both). They must never be included without any preceeding configuration header. This change make sure that we resolve conflicts by: - either disallowing inclusion of mcsconfig.h and my_config.h at the same time - or by hiding conflicting definitions #1 and #2 (with their later restoring). - also, by making sure that my_sys.h and m_ctype.h always follow a CMake configuration file. Details: - idb_mysql.h can now only be included only after my_config.h An attempt to use idb_mysql.h with mcsconfig.h instead of my_config.h is caught by the "#error" preprocessor directive. - mariadb_my_sys.h can now be only included after mcsconfig.h. An attempt to use mariadb_my_sys.h without mcscofig.h (e.g. with my_config.h) is also caught by "#error". - collation.h now can now be included in two ways. It now has the following effective structure: #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H) // Remember current conflicting definitions on the preprocessor stack // Undefine current conflicting definitions #endif #include "mcsconfig.h" #include "m_ctype.h" #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H) # Restore conflicting definitions from the preprocessor stack #endif and can be included as follows: a. using only mcsconfig.h as a configuration header: // my_config.h must not be included so far #include "collation.h" b. using my_config.h as the first included configuration file: #define PREFER_MY_CONFIG_H // Force conflict resolution #include "my_config.h" // can be included directly or indirectly ... #include "collation.h" Other changes: - Adding helper header files utils/common/mcsconfig_conflicting_defs_remember.h utils/common/mcsconfig_conflicting_defs_restore.h utils/common/mcsconfig_conflicting_defs_undef.h to perform conflict resolution easier. - Removing `#include "collation.h"` from a number of files, as it's automatically included from rowgroup.h. - Removing redundant `#include "utils_utf8.h"`. This change is not directly related to the problem being fixed, but it's nice to remove redundant directives for both collation.h and utils_utf8.h from all the files that do not really need them. (this change could probably have gone as a separate commit) - Changing my_init() to MY_INIT(argv[0]) in the MCS services sources. After the fix of the complitation failure it appeared that ColumnStore services compiled with the debug build crash due to recent changes in safemalloc. The crash happened in strcmp() with `my_progname` as an argument (where my_progname is a mysys global variable). This problem should probably be fixed on the server side as well to avoid passing NULL. But, the majority of MariaDB executable programs also use MY_INIT(argv[0]) rather than my_init(). So let's make MCS do like the other programs do.	2021-05-25 12:34:36 +04:00
Alexander Barkov	bd4cbb542d	MCOL-4721 CHAR(1) is not collation-aware for GROUP/DISTINCT	2021-05-18 16:14:53 +04:00
David Hall	13b7a794e4	MCOL-4620 Add charset to various RowGroup initializers Specifically to operator+=	2021-03-19 16:57:54 -05:00
Roman Nozdrin	494bde61e1	MCOL-4409 Moved static Decimal conversion methods into VDecimal class MCOL-4409 This patch combines VDecimal and Decimal and makes IDB_Decimal an alias for the result class MCOL-4409 More boilerplate reduction in Func_mod Removed couple TSInt128::toType() methods	2020-11-30 12:08:52 +00:00
Roman Nozdrin	2003417a89	Merge pull request #1624 from mariadb-corporation/develop-bar-MCOL-4422 MCOL-4422 Remove mariadb.h and my_sys.h dependency from collation.h	2020-11-30 15:01:17 +03:00
Alexander Barkov	2ea73846b9	MCOL-4422 Remove mariadb.h and my_sys.h dependency from collation.h	2020-11-30 14:26:35 +04:00
Roman Nozdrin	a53119d5d5	Fix crash in release builds that happens in RowGroup::initRow() for wide DECIMAL	2020-11-30 08:17:27 +00:00
Gagan Goel	995cadef2d	MCOL-641 Fix alter table add wide decimal column. This patch also removes CalpontSystemCatalog::BINARY and ddlpackage::DDL_BINARY that were added during the initial stages of the work on MCOL-641.	2020-11-20 19:49:54 -05:00
Roman Nozdrin	58495d0d2f	MCOL-4387 Convert dataconvert::decimalToString() into VDecimal and TSInt128 methods	2020-11-18 13:53:16 +00:00
Roman Nozdrin	15b1bfa709	Fix fallthrough compilation warnings	2020-11-18 13:53:15 +00:00
Alexander Barkov	129d5b5a0f	MCOL-4174 Review/refactor frontend/connector code	2020-11-18 13:53:15 +00:00
Gagan Goel	68244ab957	MCOL-641 Fix regression in aggregate distinct on narrow decimal. The else if block in Row::equals() was incorrectly getting triggered for narrow decimals earlier. We now specifically check if the column is a wide decimal. Furthermore, we need to dereference the int128_t pointers for equality comparison.	2020-11-18 13:52:20 +00:00
Roman Nozdrin	bd0d5af123	Merge fixes.	2020-11-18 13:51:26 +00:00
Gagan Goel	74b64eb4f1	MCOL-641 1. Add support for int128_t in ParsedColumnFilter. 2. Set Decimal precision in SimpleColumn::evaluate(). 3. Add support for int128_t in ConstantColumn. 4. Set IDB_Decimal::s128Value in buildDecimalColumn(). 5. Use width 16 as first if predicate for branching based on decimal width.	2020-11-18 13:47:45 +00:00
Roman Nozdrin	b09f3088ca	MCOL-641 Initial version of Math operations for wide decimal.	2020-11-18 13:47:44 +00:00
Gagan Goel	62d0c82d75	MCOL-641 1. Templatized convertValueNum() function. 2. Allocate int128_t buffers in batchprimitiveprocessor if a query involves wide decimal columns.	2020-11-18 13:47:44 +00:00
Gagan Goel	9b714274db	MCOL-641 1. Minor refactoring of decimalToString for int128_t. 2. Update unit tests for decimalToString. 3. Allow support for wide decimal in TupleConstantStep::fillInConstants().	2020-11-18 13:47:44 +00:00
Roman Nozdrin	97ee1609b2	MCOL-641 Replaced NULL binary constants. DataConvert::decimalToString, toString, writeIntPart, writeFractionalPart are not templates anymore.	2020-11-18 13:47:44 +00:00
drrtuy	b29d0c9daa	MCOL-641 Changed the hint to search for GTest headers. This commit introduces DataConvert UTs. DataConvert::decimalToString now can negative values. Next version for Row::toString(), applyMapping UT checks. Row:equals() is now wide-DECIMAL aware.	2020-11-18 13:47:02 +00:00
Roman Nozdrin	c23ead2703	MCOL-641 This commit changes NULL and EMPTY values. It also contains the refactored DataConvert::decimalToString(). Row::toString UT is finished.	2020-11-18 13:47:02 +00:00
Roman Nozdrin	de85e21c38	MCOL-641 This commit cleans up Row methods and adds couple UT for Row.	2020-11-18 13:47:02 +00:00
Roman Nozdrin	f73de30427	MCOL-641 This commit introduces GTest Suite into CS. Binary NULL magic now consists of a series of BINARYEMPTYROW-s + BINARYNULL in the end. ByteStream now has hexbyte alias. Added ColumnCommand::getEmptyRowValue to support 16 byte EMPTY values.	2020-11-18 13:47:01 +00:00
drrtuy	84f9821720	MCOL-641 Switched to DataConvert static methods in joblist code. Replaced BINARYEMPTYROW and BINARYNULL values. We need to have separate magic values for numeric and non-numeric binary types b/c numeric cant tolerate losing 0 used for magics previously. atoi128() now parses minus sign and produces negative values. RowAggregation::isNull() now uses Row::isNull() for DECIMAL.	2020-11-18 13:47:01 +00:00
drrtuy	0ff0472842	MCOL-641 sum() now works with DECIMAL(38) columns. TupleAggregateStep class method and buildAggregateColumn() now properly set result data type. doSum() now handles DECIMAL(38) in approprate manner. Low-level null related methods for new binary-based datatypes now handles magic values for binary-based DT.	2020-11-18 13:47:01 +00:00
drrtuy	98213c0094	MCOL-641 Addition now works for DECIMAL columns with precision > 18.	2020-11-18 13:47:01 +00:00
drrtuy	54c152d6c8	MCOL-641 This commit introduces templates for DataConvert and RowGroup methods.	2020-11-18 13:47:01 +00:00
Gagan Goel	32f6167067	MCOL-641 Work of Ivan Zuniga on basic read and write support for Binary16	2020-11-18 13:47:00 +00:00
Gagan Goel	452f83f333	Properly initialize hasCollation data member of Row and RowGroup classes.	2020-10-30 16:28:35 +00:00
Gagan Goel	2ba9263df4	Silence -Werror=implicit-fallthrough compiler errors - Patch from Monty. The patch also fixes some potential bugs due to missing break statements.	2020-06-26 12:32:57 -04:00
David Hall	f9078efbc6	MCOL-3536 Collation	2020-06-08 17:57:37 -05:00

1 2

81 Commits