mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-11-16 23:23:22 +03:00

Author	SHA1	Message	Date
Denis Khalikov	865cca11c9	MCOL-5505 Add TypeHandler functions.	2023-11-30 01:47:13 +04:00
HanpyBin	fe597ec78c	MCOL-5505 add parquet support for cpimport and add mcs_parquet_ddl and mcs_parquet_gen tools	2023-11-30 01:47:13 +04:00
Roman Nozdrin	eb744eafed	chore(datatypes): this refactors the placement of the main SQL data types enum to enable templates that are parametrized with this enum(see mcs_datatype_basic.h changes for more details).	2023-10-24 18:44:35 +03:00
Gagan Goel	931f2b36a1	MCOL-4931 Make cpimport charset-aware. (#2938 ) 1. Extend the following CalpontSystemCatalog member functions to set CalpontSystemCatalog::ColType::charsetNumber, after the system catalog update to add charset number to calpontsys.syscolumn in MCOL-5005: CalpontSystemCatalog::lookupOID CalpontSystemCatalog::colType CalpontSystemCatalog::columnRIDs CalpontSystemCatalog::getSchemaInfo 2. Update cpimport to use the CHARSET_INFO object associated with the charset number retrieved from the system catalog, for a dictionary/non-dictionary CHAR/VARCHAR/TEXT column, to truncate long strings that exceed the target column character length. 3. Add MTR test cases.	2023-09-05 17:17:20 +03:00
Roman Nozdrin	4fe9cd64a3	Revert "No boost condition (#2822 )" (#2828 ) This reverts commit `f916e64927`.	2023-04-22 15:49:50 +03:00
Leonid Fedorov	f916e64927	No boost condition (#2822 ) This patch replaces boost primitives with stdlib counterparts.	2023-04-22 00:42:45 +03:00
Leonid Fedorov	2e1394149b	MCOL-5464: Fixes of bugs from ASAN warnings, part one (#2792 ) * Fixes of bugs from ASAN warnings, part one * MQC as static library, with nifty counter for global map and mutex * Switch clang to 16 * link messageqcpp to execplan	2023-04-04 02:33:23 +03:00
Sergey Zefirov	b53c231ca6	MCOL-271 empty strings should not be NULLs (#2794 ) This patch improves handling of NULLs in textual fields in ColumnStore. Previously empty strings were considered NULLs and it could be a problem if data scheme allows for empty strings. It was also one of major reasons of behavior difference between ColumnStore and other engines in MariaDB family. Also, this patch fixes some other bugs and incorrect behavior, for example, incorrect comparison for "column <= ''" which evaluates to constant True for all purposes before this patch.	2023-03-30 21:18:29 +03:00
Leonid Fedorov	56f2346083	Remove windows ifdefs	2023-03-02 15:59:42 +00:00
Serguey Zefirov	53b9a2a0f9	MCOL-4580 extent elimination for dictionary-based text/varchar types The idea is relatively simple - encode prefixes of collated strings as integers and use them to compute extents' ranges. Then we can eliminate extents with strings. The actual patch does have all the code there but miss one important step: we do not keep collation index, we keep charset index. Because of this, some of the tests in the bugfix suite fail and thus main functionality is turned off. The reason of this patch to be put into PR at all is that it contains changes that made CHAR/VARCHAR columns unsigned. This change is needed in vectorization work.	2022-03-02 23:53:39 +03:00
Gagan Goel	973e5024d8	MCOL-4957 Fix performance slowdown for processing TIMESTAMP columns. Part 1: As part of MCOL-3776 to address synchronization issue while accessing the fTimeZone member of the Func class, mutex locks were added to the accessor and mutator methods. However, this slows down processing of TIMESTAMP columns in PrimProc significantly as all threads across all concurrently running queries would serialize on the mutex. This is because PrimProc only has a single global object for the functor class (class derived from Func in utils/funcexp/functor.h) for a given function name. To fix this problem: (1) We remove the fTimeZone as a member of the Func derived classes (hence removing the mutexes) and instead use the fOperationType member of the FunctionColumn class to propagate the timezone values down to the individual functor processing functions such as FunctionColumn::getStrVal(), FunctionColumn::getIntVal(), etc. (2) To achieve (1), a timezone member is added to the execplan::CalpontSystemCatalog::ColType class. Part 2: Several functors in the Funcexp code call dataconvert::gmtSecToMySQLTime() and dataconvert::mySQLTimeToGmtSec() functions for conversion between seconds since unix epoch and broken-down representation. These functions in turn call the C library function localtime_r() which currently has a known bug of holding a global lock via a call to __tz_convert. This significantly reduces performance in multi-threaded applications where multiple threads concurrently call localtime_r(). More details on the bug: https://sourceware.org/bugzilla/show_bug.cgi?id=16145 This bug in localtime_r() caused processing of the Functors in PrimProc to slowdown significantly since a query execution causes Functors code to be processed in a multi-threaded manner. As a fix, we remove the calls to localtime_r() from gmtSecToMySQLTime() and mySQLTimeToGmtSec() by performing the timezone-to-offset conversion (done in dataconvert::timeZoneToOffset()) during the execution plan creation in the plugin. Note that localtime_r() is only called when the time_zone system variable is set to "SYSTEM". This fix also required changing the timezone type from a std::string to a long across the system.	2022-02-14 14:12:27 -05:00
Leonid Fedorov	04752ec546	clang format apply	2022-01-21 16:43:49 +00:00
Alexander Barkov	9608533d92	MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop mcsconfig.h and my_config.h have the following pre-processor definitions: 1. Conflicting definitions coming from the standard cmake definitions: - PACKAGE - PACKAGE_BUGREPORT - PACKAGE_NAME - PACKAGE_STRING - PACKAGE_TARNAME - PACKAGE_VERSION - VERSION 2. Conflicting definitions of other kinds: - HAVE_STRTOLL - this is a dirt in MariaDB headers. Should be fixed in the server code. my_config.h erroneously performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1". in some cases. The former is not CMake compatible style. The latter is. 3. Non-conflicting definitions: Otherwise, mcsconfig.h and my_config.h should be mutually compatible, because both are generated by cmake on the same host machine. So they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc. Observations: - It's OK to include both mcsconfig.h and my_config.h providing that we suppress duplicate definition of the above conflicting types #1 and #2. - There is no a need to suppress duplicate definitions mentioned in #3, as they are compatible! - my_sys.h and m_ctype.h must always follow a CMake configuation header, either my_config.h or mcsconfig.h (or both). They must never be included without any preceeding configuration header. This change make sure that we resolve conflicts by: - either disallowing inclusion of mcsconfig.h and my_config.h at the same time - or by hiding conflicting definitions #1 and #2 (with their later restoring). - also, by making sure that my_sys.h and m_ctype.h always follow a CMake configuration file. Details: - idb_mysql.h can now only be included only after my_config.h An attempt to use idb_mysql.h with mcsconfig.h instead of my_config.h is caught by the "#error" preprocessor directive. - mariadb_my_sys.h can now be only included after mcsconfig.h. An attempt to use mariadb_my_sys.h without mcscofig.h (e.g. with my_config.h) is also caught by "#error". - collation.h now can now be included in two ways. It now has the following effective structure: #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H) // Remember current conflicting definitions on the preprocessor stack // Undefine current conflicting definitions #endif #include "mcsconfig.h" #include "m_ctype.h" #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H) # Restore conflicting definitions from the preprocessor stack #endif and can be included as follows: a. using only mcsconfig.h as a configuration header: // my_config.h must not be included so far #include "collation.h" b. using my_config.h as the first included configuration file: #define PREFER_MY_CONFIG_H // Force conflict resolution #include "my_config.h" // can be included directly or indirectly ... #include "collation.h" Other changes: - Adding helper header files utils/common/mcsconfig_conflicting_defs_remember.h utils/common/mcsconfig_conflicting_defs_restore.h utils/common/mcsconfig_conflicting_defs_undef.h to perform conflict resolution easier. - Removing `#include "collation.h"` from a number of files, as it's automatically included from rowgroup.h. - Removing redundant `#include "utils_utf8.h"`. This change is not directly related to the problem being fixed, but it's nice to remove redundant directives for both collation.h and utils_utf8.h from all the files that do not really need them. (this change could probably have gone as a separate commit) - Changing my_init() to MY_INIT(argv[0]) in the MCS services sources. After the fix of the complitation failure it appeared that ColumnStore services compiled with the debug build crash due to recent changes in safemalloc. The crash happened in strcmp() with `my_progname` as an argument (where my_progname is a mysys global variable). This problem should probably be fixed on the server side as well to avoid passing NULL. But, the majority of MariaDB executable programs also use MY_INIT(argv[0]) rather than my_init(). So let's make MCS do like the other programs do.	2021-05-25 12:34:36 +04:00
Gagan Goel	f6b55c1e18	MCOL-4177 Add support for bulk insertion for wide decimals. 1. This patch adds support for wide decimals with/without scale to cpimport. In addition, INSERT ... SELECT and LDI are also now supported. 2. Logic to compute the number of bytes to convert a binary representation in the buffer to a narrow decimal is also simplified.	2020-12-15 22:14:54 +00:00
Alexander Barkov	d5c6645ba1	Adding mcs_basic_types.h For now it consists of only: using int128_t = __int128; using uint128_t = unsigned __int128; All new privitive data types should go into this file in the future.	2020-11-18 13:53:15 +00:00
Alexander Barkov	129d5b5a0f	MCOL-4174 Review/refactor frontend/connector code	2020-11-18 13:53:15 +00:00
Gagan Goel	d3bc68b02f	MCOL-641 Refactor initial extent elimination support. This commit also adds support in TupleHashJoinStep::forwardCPData, although we currently do not support wide decimals as join keys. Row estimation to determine large-side of the join is also updated.	2020-11-18 13:52:19 +00:00
Gagan Goel	74b64eb4f1	MCOL-641 1. Add support for int128_t in ParsedColumnFilter. 2. Set Decimal precision in SimpleColumn::evaluate(). 3. Add support for int128_t in ConstantColumn. 4. Set IDB_Decimal::s128Value in buildDecimalColumn(). 5. Use width 16 as first if predicate for branching based on decimal width.	2020-11-18 13:47:45 +00:00
Gagan Goel	9b714274db	MCOL-641 1. Minor refactoring of decimalToString for int128_t. 2. Update unit tests for decimalToString. 3. Allow support for wide decimal in TupleConstantStep::fillInConstants().	2020-11-18 13:47:44 +00:00
Gagan Goel	55afcd8890	MCOL-641 Basic extent elimination support for Decimal38.	2020-11-18 13:47:01 +00:00
David Hall	1f3d1e6fd6	MCOL-3536 collation	2020-05-14 16:02:49 -05:00
Patrick LeBlanc	c26adc6259	MCOL-3716: Wrong min/max set for short strings cpimport was doing unsigned comparisons for these, but initializing max to MIN_BIGINT (0x8000000000000002), which is > than any ascii string, so it would never get set. Changed the init value to 0 for char types.	2020-01-24 10:30:17 -05:00
Andrew Hutchings	8633859dd4	MCOL-3514 Add support for S3 to cpimport cpimport now has the ability to use libmarias3 to read an object from an S3 bucket instead of a file on local disk. This also moves libmarias3 to utils/libmarias3.	2019-09-24 10:31:22 +01:00
Andrew Hutchings	811909aa72	Merge branch 'develop-1.2' into develop-merge-up-20190729	2019-07-29 12:19:26 +01:00
Andrew Hutchings	cddb776bd4	Merge branch 'develop-1.1' into develop-1.2-merge-up-20190619	2019-06-19 18:34:43 +01:00
Andrew Hutchings	e3cd205388	MCOL-1968 Fix UTF char/varchar min/max handling If the first byte of a char/varchar was > 0x80 then it will break the min/max values for an extent during cpimport. This patch makes the min/max compare unsigned and only switches to signed when storing. In addition send all the LDI / INSERT...SELECT data to cpimport, not truncated. Let cpimport figure out the truncation point.	2019-06-11 10:37:04 +01:00
Andrew Hutchings	5e4f1b9933	Merge branch 'develop' into MCOL-265	2019-06-10 13:58:03 +01:00
Gagan Goel	e89d1ac3cf	MCOL-265 Add support for TIMESTAMP data type	2019-04-23 00:00:09 -04:00
David Mott	b2810bf35d	fix ambiguous symbol	2019-04-18 04:43:28 -05:00
Gagan Goel	d1ada75395	MCOL-270 Add support for MEDIUMINT data type	2018-12-30 19:13:16 -05:00
Roman Nozdrin	6563f48e32	MCOL-1786 Reduce the performance degradation caused by iequals.	2018-11-21 19:43:46 +03:00
Gagan Goel	f8a9ce0fb5	MCOL-1786 Handle "true" keyword for numeric data types in cpimport	2018-10-10 01:13:39 -04:00
Andrew Hutchings	1a582eed4a	Merge branch 'develop-1.1' into 1.1-merge-up-20180509-a2	2018-05-09 09:20:55 +01:00
Andrew Hutchings	c40903de9b	MCOL-392 Apply astyle Make this branch apply our style guidelines	2018-05-01 09:52:26 +01:00
Andrew Hutchings	bd50bbb8bb	MCOL-392 Fix saturation handling	2018-04-30 09:42:41 +01:00
Andrew Hutchings	3c1ebd8b94	MCOL-392 Add initial TIME datatype support	2018-04-30 09:42:41 +01:00
Andrew Hutchings	59d0a45da3	Merge branch 'develop-1.1' into 1.1-merge-up	2017-12-12 20:26:00 +00:00
Andrew Hutchings	3d5bd3809c	MCOL-444 Truncate UTF8 correctly cpimport would truncate UTF8 data half way through a character which would cause problems for functions using that data. This patch calculates the correct truncation point when inserting the data.	2017-11-29 10:43:57 +00:00
Andrew Hutchings	01446d1e22	Reformat all code to coding standard	2017-10-26 17:18:17 +01:00
Andrew Hutchings	d551b7d6e0	MCOL-298 Fix saturated date/datetime handling Saturated DML updates would be set to NULL as were saturated cpimport values. This sets them to the zero date/datetime value.	2016-09-14 19:58:11 +01:00
david hill	f6afc42dd0	the begginning	2016-01-06 14:08:59 -06:00

41 Commits