1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00
Commit Graph

239 Commits

Author SHA1 Message Date
6a4140394d [MCOL-4829] More accurate memory counting 2021-09-07 19:52:20 +03:00
7fea3c988e [MCOL-4829] Compression for the temp disk-based aggregation files 2021-09-02 19:30:25 +03:00
46cf13ffa8 Merge pull request #2101 from denis0x0D/MCOL-4810_2
MCOL-4810 Redundant copying and wasting memory in PrimProc
2021-08-27 14:05:51 +03:00
7bda598fbf MCOL-4810 Redundant copying and wasting memory in PrimProc
This patch eliminates a copying `long string`s into the bytestream.
2021-08-26 12:16:23 +03:00
5c5f103f98 MCOL-4839: Fix clang build (#2100)
* Fix clang build

* Extern C returned to plugin_instance

Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>
2021-08-23 10:45:10 -05:00
73e710ed52 Add ctest for google unittests 2021-08-02 19:41:04 +03:00
3d557a2f1e Merge pull request #2044 from dhall-MariaDB/MCOL-3738
MCOL-3738 COUNT(DISTINCT) with multiple parms
2021-07-12 07:34:56 -04:00
51a8ffcb6a Fix sumavgoverflow.sql test 2021-07-09 22:41:28 +00:00
76607be63a MCOL-3738 COUNT(DISTINCT) with multiple parms
Fixed regression
Added a few more mtr tests
2021-07-09 09:07:03 -05:00
f81f743282 Replace underlying type for avg and sum for int types from long double to wide decimal 2021-07-08 17:04:43 +00:00
1113470551 MCOL-4738 AVG gives wrong results with strict_aliasing
A f fix that works with strict_aliasing
2021-07-07 13:08:32 -05:00
8988253ff4 Merge pull request #2031 from mariadb-corporation/bar-develop-MCOL-4801
MCOL-4801 Replace Row methods getStringLength() and getStringPointer(…
2021-07-07 13:53:19 +04:00
8332ab8974 MCOL-4738 AVG() returns a wrong result
On AMD64 machines, the fpu is 80 bits. The unused bits must be masked for memcmp to work properly. For other archetectures, we don't want to mask those bits.
2021-07-06 19:50:00 -05:00
9794f24369 MCOL-4801 Replace Row methods getStringLength() and getStringPointer() to getConstString() 2021-07-06 21:15:32 +04:00
8520f87237 MCOL-641 Cleanup. 2021-07-06 09:01:49 +00:00
60495564b8 [MCOL-4709] Fix another UB in disk aggregation 2021-06-29 17:47:07 +03:00
8a0b68f25e [MCOL-4709] Fix UB in disk aggregation 2021-06-28 20:07:23 +03:00
8c360a1a27 MCOL-4759 Upmerge for MCOL-4564 code that implements hash merging family to reduce
performance penalty using MDB hashing functions
2021-06-24 14:48:01 +00:00
bed0b7c6bc MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI, functional JOINs
based on top of TypelessData
2021-06-24 08:07:23 +00:00
b3d6f62964 MCOL-4753 Performance problem in Typeless join 2021-06-10 09:26:26 +00:00
0dedb7e628 Fix compilation warnings 2021-06-09 16:51:00 +03:00
475104e4d3 [MCOL-4709] Disk-based aggregation
* Introduce multigeneration aggregation

* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)

* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization

* Try to limit the qty of buckets at a low limit

* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory

* do not dump more than half of rowgroups to disk if generations are
  allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
  so the generations start more evenly

* Unify temp data location

* Explicitly create temp subdirectories
  whether disk aggregation/join are enabled or not
2021-06-06 16:09:15 +03:00
9608533d92 MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop
mcsconfig.h and my_config.h have the following
pre-processor definitions:

1. Conflicting definitions coming from the standard cmake definitions:
- PACKAGE
- PACKAGE_BUGREPORT
- PACKAGE_NAME
- PACKAGE_STRING
- PACKAGE_TARNAME
- PACKAGE_VERSION
- VERSION

2. Conflicting definitions of other kinds:
- HAVE_STRTOLL - this is a dirt in MariaDB headers.
  Should be fixed in the server code. my_config.h erroneously
  performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1".
  in some cases. The former is not CMake compatible style. The latter is.

3. Non-conflicting definitions:
  Otherwise, mcsconfig.h and my_config.h should be mutually compatible,
  because both are generated by cmake on the same host machine. So
  they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc.

Observations:
- It's OK to include both mcsconfig.h and my_config.h providing that we
  suppress duplicate definition of the above conflicting types #1 and #2.
- There is no a need to suppress duplicate definitions mentioned in #3,
  as they are compatible!
- my_sys.h and m_ctype.h must always follow a CMake configuation header,
  either my_config.h or mcsconfig.h (or both).
  They must never be included without any preceeding configuration header.

This change make sure that we resolve conflicts by:
- either disallowing inclusion of mcsconfig.h and my_config.h
  at the same time
- or by hiding conflicting definitions #1 and #2
  (with their later restoring).
- also, by making sure that my_sys.h and m_ctype.h always follow
  a CMake configuration file.

Details:
- idb_mysql.h can now only be included only after my_config.h
  An attempt to use idb_mysql.h with mcsconfig.h instead of
  my_config.h is caught by the "#error" preprocessor directive.

- mariadb_my_sys.h can now be only included after mcsconfig.h.
  An attempt to use mariadb_my_sys.h without mcscofig.h
  (e.g. with my_config.h) is also caught by "#error".

- collation.h now can now be included in two ways.
  It now has the following effective structure:

    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    //  Remember current conflicting definitions on the preprocessor stack
    //  Undefine current conflicting definitions
    #endif
    #include "mcsconfig.h"
    #include "m_ctype.h"
    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    #    Restore conflicting definitions from the preprocessor stack
    #endif

  and can be included as follows:

  a. using only mcsconfig.h as a configuration header:

    // my_config.h must not be included so far
    #include "collation.h"

  b. using my_config.h as the first included configuration file:

    #define PREFER_MY_CONFIG_H // Force conflict resolution
    #include "my_config.h"     // can be included directly or indirectly
    ...
    #include "collation.h"

Other changes:

- Adding helper header files
     utils/common/mcsconfig_conflicting_defs_remember.h
     utils/common/mcsconfig_conflicting_defs_restore.h
     utils/common/mcsconfig_conflicting_defs_undef.h
  to perform conflict resolution easier.

- Removing `#include "collation.h"` from a number of files,
  as it's automatically included from rowgroup.h.

- Removing redundant `#include "utils_utf8.h"`.
  This change is not directly related to the problem being fixed,
  but it's nice to remove redundant directives for both collation.h
  and utils_utf8.h from all the files that do not really need them.
  (this change could probably have gone as a separate commit)

- Changing my_init() to MY_INIT(argv[0]) in the MCS services sources.
  After the fix of the complitation failure it appeared that ColumnStore
  services compiled with the debug build crash due to recent changes in
  safemalloc. The crash happened in strcmp() with `my_progname` as an argument
  (where my_progname is a mysys global variable). This problem should
  probably be fixed on the server side as well to avoid passing NULL.
  But, the majority of MariaDB executable programs also use MY_INIT(argv[0])
  rather than my_init(). So let's make MCS do like the other programs do.
2021-05-25 12:34:36 +04:00
bd4cbb542d MCOL-4721 CHAR(1) is not collation-aware for GROUP/DISTINCT 2021-05-18 16:14:53 +04:00
f4e6939139 MCOL-4643 dev 5 reset valOut after processing UDAF
After a UDAF result has been inserted in the output stream, the valOut object needs to be reset to empty in preparation for the next value. Failing to do so may cause what should be a NULL value to erroneously take the last value inserted.
2021-04-30 10:57:40 -05:00
362bfcd15e MCOL-4361 Replace pow(10.0, (double)scale) expressions with a static dictionary lookup. 2021-04-09 12:41:04 +04:00
69911c2710 A joint patch for MCOL-4614, MCOL-4615, MCOL-4660 (decimal to string conversion)
This patch fixes:
- MCOL-4614 calShowPartitions() precision loss for huge narrow decimal
- MCOL-4615 GROUP_CONCAT() precision loss for huge narrow decimal
- MCOL-4660 Narow decimal to string conversion is inconsistent about zero integral

Changes:
- Implementing Row::getDecimalField()

- Removing double arithmetic from the code printing DECIMAL values
  in TypeHandlerXDecimal::format64() and GroupConcator::outputRow().
  Using Decimal::toString() instead.

- Rewriting Decimal::toStringTSInt64(). The old implementation
  was wrong, too complex and slow (used unnecessary memmove, memcpy).

An additional cleanup:
- Removing the ENGINE=COLUMNSTORE clause from tests for MCOL-4532 and MCOL-4640
  type_decimal.test is combinations-aware. It's run two times with
  default_storage_engine=MyISAM and default_storage_engine=COLUMNSTORE.
  So the CREATE TABLE statements should not specify the engine explicitly.
- Adding --disable_warnings in the old fixed test.
  We needed to suppress warnings when the MyISAM combination is being run.
  Previously the table was erroneously created with ENGINE=COLUMNSTORE
  even with the MyISAM combination run. So warning were not generated.
2021-04-05 16:36:19 +04:00
0eee6cfc62 MCOL-4643 reset valOut after UDAF evaluation 2021-03-26 16:09:15 -05:00
13b7a794e4 MCOL-4620 Add charset to various RowGroup initializers
Specifically to operator+=
2021-03-19 16:57:54 -05:00
5080e1ae53 MCOL-4031 More accurate memory usage counting while sorting 2021-01-29 18:31:20 +03:00
a687df48b9 MCOL-4065 DISTINCT is case sensitive
This patch makes DISTINCT and GROUP BY collation aware.
2021-01-21 15:46:54 +04:00
5b9689ce55 MCOL-4478 MCS now rounds the last digits of an avg() result for wide-DECIMAL argument 2020-12-30 15:02:12 +00:00
5815c5c526 MCOL-4452 RowAggregationUMP2::doUDAF() now calls setUserData() using a correct UDAF context 2020-12-22 15:43:51 +00:00
494bde61e1 MCOL-4409 Moved static Decimal conversion methods into VDecimal class
MCOL-4409 This patch combines VDecimal and Decimal and makes
IDB_Decimal an alias for the result class

MCOL-4409 More boilerplate reduction in Func_mod

Removed couple TSInt128::toType() methods
2020-11-30 12:08:52 +00:00
2003417a89 Merge pull request #1624 from mariadb-corporation/develop-bar-MCOL-4422
MCOL-4422 Remove mariadb.h and my_sys.h dependency from collation.h
2020-11-30 15:01:17 +03:00
2ea73846b9 MCOL-4422 Remove mariadb.h and my_sys.h dependency from collation.h 2020-11-30 14:26:35 +04:00
a53119d5d5 Fix crash in release builds that happens in RowGroup::initRow() for wide DECIMAL 2020-11-30 08:17:27 +00:00
995cadef2d MCOL-641 Fix alter table add wide decimal column.
This patch also removes CalpontSystemCatalog::BINARY and
ddlpackage::DDL_BINARY that were added during the initial
stages of the work on MCOL-641.
2020-11-20 19:49:54 -05:00
178be69bc4 MCOL-4394 __float128 related code had been moved into a separate file
Trim to double and to long double conversions for Decimal
2020-11-19 12:08:18 +00:00
58495d0d2f MCOL-4387 Convert dataconvert::decimalToString() into VDecimal and TSInt128 methods 2020-11-18 13:53:16 +00:00
c00daa93bd MCOL-4172 MultiDistinctRowAggregation didn't honor multiple UDAF in projection
::doUDAF() doesn't crash anymore trying to access fRGContextColl[] elements
that doesn't exist running RowAggregationMultiDistinct::doAggregate()
2020-11-18 13:53:15 +00:00
15b1bfa709 Fix fallthrough compilation warnings 2020-11-18 13:53:15 +00:00
3eb26c0d4a MCOL-4313 Introduced TSInt128 that is a storage class for int128
Removed uint128 from joblist/lbidlist.*

Another toString() method for wide-decimal that is EMPTY/NULL aware

Unified decimal processing in WF functions

Fixed a potential issue in EqualCompData::operator() for
    wide-decimal processing

Fixed some signedness warnings
2020-11-18 13:53:15 +00:00
d5c6645ba1 Adding mcs_basic_types.h
For now it consists of only:

using int128_t = __int128;
using uint128_t = unsigned __int128;

All new privitive data types should go into this file in the future.
2020-11-18 13:53:15 +00:00
129d5b5a0f MCOL-4174 Review/refactor frontend/connector code 2020-11-18 13:53:15 +00:00
68244ab957 MCOL-641 Fix regression in aggregate distinct on narrow decimal.
The else if block in Row::equals() was incorrectly getting triggered
for narrow decimals earlier. We now specifically check if the column
is a wide decimal. Furthermore, we need to dereference the int128_t
pointers for equality comparison.
2020-11-18 13:52:20 +00:00
844472d812 MCOL-4313 Very fragile but high speed approach with inline ASM
GCC compiler uses aligned versions of SIMD instructions expecting
aligned memory blocks that is hard to implement now
2020-11-18 13:52:20 +00:00
8de9764f84 MCOL-4172 Add support for wide-DECIMAL into statistical aggregate and regr_* UDAF functions
The patch fixes wrong results returned when multiple UDAF exist in projection

aggregate over wide decimal literals now works
2020-11-18 13:52:20 +00:00
1c3a34a3d0 Dataconvert::decimalToString badly fails w/o 20th member of mcs_pow_10 so I returned it
WF::percentile runtime threw an exception b/c of wrong DT deduced from its argument

Replaced literals with constants

Tought WF_sum_avg::checkSumLimit to use refs instead of values
2020-11-18 13:52:20 +00:00
af80081c94 MCOL-4171 Some fixes 2020-11-18 13:52:20 +00:00