1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-06-10 17:41:44 +03:00
Commit Graph

197 Commits

Author SHA1 Message Date
8c360a1a27 MCOL-4759 Upmerge for MCOL-4564 code that implements hash merging family to reduce
performance penalty using MDB hashing functions
2021-06-24 14:48:01 +00:00
2de4888899 Merge pull request #1990 from drrtuy/MCOL-4173_9
MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI…
2021-06-24 16:15:07 +03:00
bed0b7c6bc MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI, functional JOINs
based on top of TypelessData
2021-06-24 08:07:23 +00:00
7c8b502dc2 Fix regression in a query involving an aggregate function on a
non-wide decimal column in the HAVING clause.

In buildAggregateColumn(), if an aggregate function (such as avg)
is applied on a non-wide decimal column, we were setting the precision
of the resulting column as -1. This later down in the execution got
converted to 255 as in some cases, precision is stored as uint8_t.
The predicate operations on a DECIMAL column has logic that uses
the wide Decimal::s128value field if precision > 18. This logic incorrectly
used the Decimal::s128value instead of the correct value stored in the
narrow Decimal::value field, since precision of the Decimal column
was 255. The fix is to set the aggregate column precision to
datatypes::INT64MAXPRECISION (18) in buildAggregateColumn() when the
aggregate is applied on a non-wide decimal column.

This commit also partially fixes -Wstrict-aliasing GCC warnings.
2021-06-22 11:11:34 +00:00
475104e4d3 [MCOL-4709] Disk-based aggregation
* Introduce multigeneration aggregation

* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)

* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization

* Try to limit the qty of buckets at a low limit

* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory

* do not dump more than half of rowgroups to disk if generations are
  allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
  so the generations start more evenly

* Unify temp data location

* Explicitly create temp subdirectories
  whether disk aggregation/join are enabled or not
2021-06-06 16:09:15 +03:00
9608533d92 MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop
mcsconfig.h and my_config.h have the following
pre-processor definitions:

1. Conflicting definitions coming from the standard cmake definitions:
- PACKAGE
- PACKAGE_BUGREPORT
- PACKAGE_NAME
- PACKAGE_STRING
- PACKAGE_TARNAME
- PACKAGE_VERSION
- VERSION

2. Conflicting definitions of other kinds:
- HAVE_STRTOLL - this is a dirt in MariaDB headers.
  Should be fixed in the server code. my_config.h erroneously
  performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1".
  in some cases. The former is not CMake compatible style. The latter is.

3. Non-conflicting definitions:
  Otherwise, mcsconfig.h and my_config.h should be mutually compatible,
  because both are generated by cmake on the same host machine. So
  they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc.

Observations:
- It's OK to include both mcsconfig.h and my_config.h providing that we
  suppress duplicate definition of the above conflicting types #1 and #2.
- There is no a need to suppress duplicate definitions mentioned in #3,
  as they are compatible!
- my_sys.h and m_ctype.h must always follow a CMake configuation header,
  either my_config.h or mcsconfig.h (or both).
  They must never be included without any preceeding configuration header.

This change make sure that we resolve conflicts by:
- either disallowing inclusion of mcsconfig.h and my_config.h
  at the same time
- or by hiding conflicting definitions #1 and #2
  (with their later restoring).
- also, by making sure that my_sys.h and m_ctype.h always follow
  a CMake configuration file.

Details:
- idb_mysql.h can now only be included only after my_config.h
  An attempt to use idb_mysql.h with mcsconfig.h instead of
  my_config.h is caught by the "#error" preprocessor directive.

- mariadb_my_sys.h can now be only included after mcsconfig.h.
  An attempt to use mariadb_my_sys.h without mcscofig.h
  (e.g. with my_config.h) is also caught by "#error".

- collation.h now can now be included in two ways.
  It now has the following effective structure:

    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    //  Remember current conflicting definitions on the preprocessor stack
    //  Undefine current conflicting definitions
    #endif
    #include "mcsconfig.h"
    #include "m_ctype.h"
    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    #    Restore conflicting definitions from the preprocessor stack
    #endif

  and can be included as follows:

  a. using only mcsconfig.h as a configuration header:

    // my_config.h must not be included so far
    #include "collation.h"

  b. using my_config.h as the first included configuration file:

    #define PREFER_MY_CONFIG_H // Force conflict resolution
    #include "my_config.h"     // can be included directly or indirectly
    ...
    #include "collation.h"

Other changes:

- Adding helper header files
     utils/common/mcsconfig_conflicting_defs_remember.h
     utils/common/mcsconfig_conflicting_defs_restore.h
     utils/common/mcsconfig_conflicting_defs_undef.h
  to perform conflict resolution easier.

- Removing `#include "collation.h"` from a number of files,
  as it's automatically included from rowgroup.h.

- Removing redundant `#include "utils_utf8.h"`.
  This change is not directly related to the problem being fixed,
  but it's nice to remove redundant directives for both collation.h
  and utils_utf8.h from all the files that do not really need them.
  (this change could probably have gone as a separate commit)

- Changing my_init() to MY_INIT(argv[0]) in the MCS services sources.
  After the fix of the complitation failure it appeared that ColumnStore
  services compiled with the debug build crash due to recent changes in
  safemalloc. The crash happened in strcmp() with `my_progname` as an argument
  (where my_progname is a mysys global variable). This problem should
  probably be fixed on the server side as well to avoid passing NULL.
  But, the majority of MariaDB executable programs also use MY_INIT(argv[0])
  rather than my_init(). So let's make MCS do like the other programs do.
2021-05-25 12:34:36 +04:00
765858bc5b MCOL-4498 LIKE is not collation aware 2021-03-22 20:42:01 +04:00
afa88866bb MCOL-4483: Fix and consolidate log files and cpimport logging. 2021-02-12 15:40:16 -06:00
69da915160 MCOL-4531 New string-to-decimal conversion implementation
This change fixes:

MCOL-4462 CAST(varchar_expr AS DECIMAL(M,N)) returns a wrong result
MCOL-4500 Bit functions processing throws internally trying to cast char into decimal representation
MCOL-4532 CAST(AS DECIMAL) returns a garbage for large values

Also, this change makes string-to-decimal conversion 5-10 times faster,
depending on exact data.
Performance implemenent is achieved by the fact that (unlike in the old
implementation), the new version does not do any "string" object copying.
2021-02-09 13:02:27 +04:00
5fce19df0a MCOL-4412 Introduce TypeHandler::getEmptyValueForType to return const ptr for an empty value
WE changes for SQL DML and DDL operations

Changes for bulk operations

Changes for scanning operations

Cleanup
2021-01-18 12:30:17 +00:00
b08d719593 A cleanup for MCOL-4064 Make JOIN collation aware
A non-JOIN condition like `WHERE c1=c2` (with c1 and c2 being columns of the
same table) was not collation-aware yet after the main patches for MCOL-4064.

Additionally fixing StrFilterCmd::compare*() to address this.
2020-12-08 16:43:07 +04:00
c6158eee31 Part#1 MCOL-4064 Make JOIN collation aware
Making field1=field2 collation aware for long CHAR/VARCHAR.
2020-12-04 08:41:26 +04:00
52c5af054a Part#2 MCOL-495 Make string comparison not case sensitive
Fixing field='str' for short (non-Dict) CHAR and VARCHAR data types.
2020-12-04 08:40:29 +04:00
0ff6a6ec20 Part#1 MCOL-495 Make string comparison not case sensitive
Fixing field='str' for long (Dict) string data types.
2020-12-04 07:49:00 +04:00
2ea73846b9 MCOL-4422 Remove mariadb.h and my_sys.h dependency from collation.h 2020-11-30 14:26:35 +04:00
995cadef2d MCOL-641 Fix alter table add wide decimal column.
This patch also removes CalpontSystemCatalog::BINARY and
ddlpackage::DDL_BINARY that were added during the initial
stages of the work on MCOL-641.
2020-11-20 19:49:54 -05:00
aa44bca473 A pack of fixes for compilation errors and warnings for all platforms
Add libdatatypes.so into debian packaging
2020-11-19 10:21:45 +00:00
3eb26c0d4a MCOL-4313 Introduced TSInt128 that is a storage class for int128
Removed uint128 from joblist/lbidlist.*

Another toString() method for wide-decimal that is EMPTY/NULL aware

Unified decimal processing in WF functions

Fixed a potential issue in EqualCompData::operator() for
    wide-decimal processing

Fixed some signedness warnings
2020-11-18 13:53:15 +00:00
d5c6645ba1 Adding mcs_basic_types.h
For now it consists of only:

using int128_t = __int128;
using uint128_t = unsigned __int128;

All new privitive data types should go into this file in the future.
2020-11-18 13:53:15 +00:00
129d5b5a0f MCOL-4174 Review/refactor frontend/connector code 2020-11-18 13:53:15 +00:00
8de9764f84 MCOL-4172 Add support for wide-DECIMAL into statistical aggregate and regr_* UDAF functions
The patch fixes wrong results returned when multiple UDAF exist in projection

aggregate over wide decimal literals now works
2020-11-18 13:52:20 +00:00
af80081c94 MCOL-4171 Some fixes 2020-11-18 13:52:20 +00:00
1588ebe439 MCOL-641 Clean up primitives code
Add int128_t support into ByteStream

Fixed UTs broken after collation patch
2020-11-18 13:52:19 +00:00
d3bc68b02f MCOL-641 Refactor initial extent elimination support.
This commit also adds support in TupleHashJoinStep::forwardCPData,
although we currently do not support wide decimals as join keys.

Row estimation to determine large-side of the join is also updated.
2020-11-18 13:52:19 +00:00
6aea838360 MCOL-641 Add support for functions (Part 2). 2020-11-18 13:51:55 +00:00
a7fcf39f2a MCOL-641 Fixed group_concat for narrow-DECIMALs. 2020-11-18 13:51:26 +00:00
74b64eb4f1 MCOL-641 1. Add support for int128_t in ParsedColumnFilter.
2. Set Decimal precision in SimpleColumn::evaluate().
3. Add support for int128_t in ConstantColumn.
4. Set IDB_Decimal::s128Value in buildDecimalColumn().
5. Use width 16 as first if predicate for branching based on decimal width.
2020-11-18 13:47:45 +00:00
b09f3088ca MCOL-641 Initial version of Math operations for wide decimal. 2020-11-18 13:47:44 +00:00
62d0c82d75 MCOL-641 1. Templatized convertValueNum() function.
2. Allocate int128_t buffers in batchprimitiveprocessor if
a query involves wide decimal columns.
2020-11-18 13:47:44 +00:00
2e8e7d52c3 Renamed datatypes/decimal.* into csdecimal to avoid collision with MDB. 2020-11-18 13:47:44 +00:00
824615a55b MCOL-641 Refactor empty value implementation in writeengine. 2020-11-18 13:47:44 +00:00
97ee1609b2 MCOL-641 Replaced NULL binary constants.
DataConvert::decimalToString, toString, writeIntPart, writeFractionalPart are not templates anymore.
2020-11-18 13:47:44 +00:00
8f80c1dee6 MCOL-641 1. Implement int128 version of strtoll.
2. Templatize number_int_value.
3. Add test cases for strtoll128 and number_int_value for Decimal38.
2020-11-18 13:47:02 +00:00
c23ead2703 MCOL-641 This commit changes NULL and EMPTY values.
It also contains the refactored DataConvert::decimalToString().

Row::toString UT is finished.
2020-11-18 13:47:02 +00:00
f73de30427 MCOL-641 This commit introduces GTest Suite into CS.
Binary NULL magic now consists of a series of BINARYEMPTYROW-s + BINARYNULL
in the end.

ByteStream now has hexbyte alias.

Added ColumnCommand::getEmptyRowValue to support 16 byte EMPTY values.
2020-11-18 13:47:01 +00:00
2eb5af1d24 MCOL-641 This commit adds support for SIGNED and ZEROFILL keywords in
CREATE TABLE. ZEROFILL is dummy though.

There is a new file with column width utilities.

Array access was replaced by a variable that is calculated only once in
TupleJoiner::updateCPData.
2020-11-18 13:47:01 +00:00
32f6167067 MCOL-641 Work of Ivan Zuniga on basic read and write support for Binary16 2020-11-18 13:47:00 +00:00
b25fee320a Remove variable-length arrays (-Wvla) 2020-11-17 15:03:10 +03:00
0e29b0b0f9 Fix -Wtype-limits 2020-11-17 15:03:10 +03:00
ab44ef6ddb MCOL-4170 Refactor services/systemd units to finish their bootstrap ... 2020-11-09 12:01:16 +04:00
2ba9263df4 Silence -Werror=implicit-fallthrough compiler errors - Patch from Monty.
The patch also fixes some potential bugs due to missing break
statements.
2020-06-26 12:32:57 -04:00
73e3974532 MCOL-4030: any.h dtor fix. 2020-06-24 16:42:57 -05:00
88d2fe6d03 MCOL-4030: any.h non-virtual-dtor warning suppressed. 2020-06-24 13:37:50 -05:00
de125bac2b MCOL-3536 Collation phase 2 2020-06-18 13:54:16 -05:00
adda6567e2 MCOL-3536 Collation 2020-06-08 17:58:59 -05:00
f9078efbc6 MCOL-3536 Collation 2020-06-08 17:57:37 -05:00
4832ec83f8 MCOL-3536 Collation 2020-06-07 14:07:10 -05:00
8479a87e46 Merge branch 'develop' into MCOL-3536 2020-05-18 16:22:01 -05:00
1f3d1e6fd6 MCOL-3536 collation 2020-05-14 16:02:49 -05:00
98abf95eae MCOL-3991 MCS is now single package and properly uninstalls 2020-05-12 13:36:24 +00:00