1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00
Commit Graph

1124 Commits

Author SHA1 Message Date
cc1c3629c5 MCOL-987 Add LZ4 compression.
* Adds CompressInterfaceLZ4 which uses LZ4 API for compress/uncompress.
* Adds CMake machinery to search LZ4 on running host.
* All methods which use static data and do not modify any internal data - become `static`,
  so we can use them without creation of the specific object. This is possible, because
  the header specification has not been modified. We still use 2 sections in header, first
  one with file meta data, the second one with pointers for compressed chunks.
* Methods `compress`, `uncompress`, `maxCompressedSize`, `getUncompressedSize` - become
  pure virtual, so we can override them for the other compression algos.
* Adds method `getChunkMagicNumber`, so we can verify chunk magic number
  for each compression algo.
* Renames "s/IDBCompressInterface/CompressInterface/g" according to requirement.
2021-07-06 18:04:37 +03:00
8520f87237 MCOL-641 Cleanup. 2021-07-06 09:01:49 +00:00
1d5f309b8f MCOL-1205 Support queries with circular joins
This patch adds support for queries with circular joins.
Currently support added for inner joins only.
2021-07-02 18:37:07 +03:00
c20015a7b2 MCOL-4713 Analyze table implementation. 2021-07-02 12:37:12 +03:00
60495564b8 [MCOL-4709] Fix another UB in disk aggregation 2021-06-29 17:47:07 +03:00
8a0b68f25e [MCOL-4709] Fix UB in disk aggregation 2021-06-28 20:07:23 +03:00
d8cbc000e2 Merge pull request #2004 from drrtuy/MCOL-4759
MCOL-4759 Upmerge for MCOL-4564 code that implements hash merging fam…
2021-06-28 14:05:16 +03:00
8c360a1a27 MCOL-4759 Upmerge for MCOL-4564 code that implements hash merging family to reduce
performance penalty using MDB hashing functions
2021-06-24 14:48:01 +00:00
2de4888899 Merge pull request #1990 from drrtuy/MCOL-4173_9
MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI…
2021-06-24 16:15:07 +03:00
bed0b7c6bc MCOL-4173 This patch adds support for wide-DECIMAL INNER, OUTER, SEMI, functional JOINs
based on top of TypelessData
2021-06-24 08:07:23 +00:00
7c8b502dc2 Fix regression in a query involving an aggregate function on a
non-wide decimal column in the HAVING clause.

In buildAggregateColumn(), if an aggregate function (such as avg)
is applied on a non-wide decimal column, we were setting the precision
of the resulting column as -1. This later down in the execution got
converted to 255 as in some cases, precision is stored as uint8_t.
The predicate operations on a DECIMAL column has logic that uses
the wide Decimal::s128value field if precision > 18. This logic incorrectly
used the Decimal::s128value instead of the correct value stored in the
narrow Decimal::value field, since precision of the Decimal column
was 255. The fix is to set the aggregate column precision to
datatypes::INT64MAXPRECISION (18) in buildAggregateColumn() when the
aggregate is applied on a non-wide decimal column.

This commit also partially fixes -Wstrict-aliasing GCC warnings.
2021-06-22 11:11:34 +00:00
5155a08a67 Merge pull request #1987 from mariadb-corporation/bar-develop-MCOL-4700
MCOL-4700 Wrong result of a UNION for INT and INT UNSIGNED
2021-06-14 02:36:10 -04:00
67449418ed MCOL-4700 Wrong result of a UNION for INT and INT UNSIGNED 2021-06-11 19:31:51 +04:00
b3d6f62964 MCOL-4753 Performance problem in Typeless join 2021-06-10 09:26:26 +00:00
0dedb7e628 Fix compilation warnings 2021-06-09 16:51:00 +03:00
7a152c6a19 Merge pull request #1944 from mariadb-AlexeyAntipovsky/MCOL-563-dev
[MCOL-4709] Disk-based aggregation
2021-06-08 20:42:58 +03:00
475104e4d3 [MCOL-4709] Disk-based aggregation
* Introduce multigeneration aggregation

* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)

* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization

* Try to limit the qty of buckets at a low limit

* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory

* do not dump more than half of rowgroups to disk if generations are
  allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
  so the generations start more evenly

* Unify temp data location

* Explicitly create temp subdirectories
  whether disk aggregation/join are enabled or not
2021-06-06 16:09:15 +03:00
606194e6e4 MCOL-4685: Eliminate some irrelevant settings (uncompressed data and extents per file).
This patch:
1. Removes the option to declare uncompressed columns (set columnstore_compression_type = 0).
2. Ignores [COMMENT '[compression=0] option at table or column level (no error messages, just disregard).
3. Removes the option to set more than 2 extents per file (ExtentsPreSegmentFile).
4. Updates rebuildEM tool to support up to 10 dictionary extent per dictionary segment file.
5. Adds check for `DBRootStorageType` for rebuildEM tool.
6. Renamed rebuildEM to mcsRebuildEM.
2021-06-03 14:44:33 +03:00
90397dfed0 MCOL-4675 DMLProc now automatically and gracefully shutdowns when a cluster state is set to
SS_SHUTDOWN_PENDING | SS_ROLLBACK
2021-05-27 11:07:32 +00:00
42e710f817 Merge pull request #1942 from mariadb-corporation/bar-develop-compile-10.6
Fixing 10.6 + develop compilation failure
2021-05-25 14:31:37 +03:00
9608533d92 MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop
mcsconfig.h and my_config.h have the following
pre-processor definitions:

1. Conflicting definitions coming from the standard cmake definitions:
- PACKAGE
- PACKAGE_BUGREPORT
- PACKAGE_NAME
- PACKAGE_STRING
- PACKAGE_TARNAME
- PACKAGE_VERSION
- VERSION

2. Conflicting definitions of other kinds:
- HAVE_STRTOLL - this is a dirt in MariaDB headers.
  Should be fixed in the server code. my_config.h erroneously
  performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1".
  in some cases. The former is not CMake compatible style. The latter is.

3. Non-conflicting definitions:
  Otherwise, mcsconfig.h and my_config.h should be mutually compatible,
  because both are generated by cmake on the same host machine. So
  they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc.

Observations:
- It's OK to include both mcsconfig.h and my_config.h providing that we
  suppress duplicate definition of the above conflicting types #1 and #2.
- There is no a need to suppress duplicate definitions mentioned in #3,
  as they are compatible!
- my_sys.h and m_ctype.h must always follow a CMake configuation header,
  either my_config.h or mcsconfig.h (or both).
  They must never be included without any preceeding configuration header.

This change make sure that we resolve conflicts by:
- either disallowing inclusion of mcsconfig.h and my_config.h
  at the same time
- or by hiding conflicting definitions #1 and #2
  (with their later restoring).
- also, by making sure that my_sys.h and m_ctype.h always follow
  a CMake configuration file.

Details:
- idb_mysql.h can now only be included only after my_config.h
  An attempt to use idb_mysql.h with mcsconfig.h instead of
  my_config.h is caught by the "#error" preprocessor directive.

- mariadb_my_sys.h can now be only included after mcsconfig.h.
  An attempt to use mariadb_my_sys.h without mcscofig.h
  (e.g. with my_config.h) is also caught by "#error".

- collation.h now can now be included in two ways.
  It now has the following effective structure:

    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    //  Remember current conflicting definitions on the preprocessor stack
    //  Undefine current conflicting definitions
    #endif
    #include "mcsconfig.h"
    #include "m_ctype.h"
    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    #    Restore conflicting definitions from the preprocessor stack
    #endif

  and can be included as follows:

  a. using only mcsconfig.h as a configuration header:

    // my_config.h must not be included so far
    #include "collation.h"

  b. using my_config.h as the first included configuration file:

    #define PREFER_MY_CONFIG_H // Force conflict resolution
    #include "my_config.h"     // can be included directly or indirectly
    ...
    #include "collation.h"

Other changes:

- Adding helper header files
     utils/common/mcsconfig_conflicting_defs_remember.h
     utils/common/mcsconfig_conflicting_defs_restore.h
     utils/common/mcsconfig_conflicting_defs_undef.h
  to perform conflict resolution easier.

- Removing `#include "collation.h"` from a number of files,
  as it's automatically included from rowgroup.h.

- Removing redundant `#include "utils_utf8.h"`.
  This change is not directly related to the problem being fixed,
  but it's nice to remove redundant directives for both collation.h
  and utils_utf8.h from all the files that do not really need them.
  (this change could probably have gone as a separate commit)

- Changing my_init() to MY_INIT(argv[0]) in the MCS services sources.
  After the fix of the complitation failure it appeared that ColumnStore
  services compiled with the debug build crash due to recent changes in
  safemalloc. The crash happened in strcmp() with `my_progname` as an argument
  (where my_progname is a mysys global variable). This problem should
  probably be fixed on the server side as well to avoid passing NULL.
  But, the majority of MariaDB executable programs also use MY_INIT(argv[0])
  rather than my_init(). So let's make MCS do like the other programs do.
2021-05-25 12:34:36 +04:00
284fc51bb7 MCOL-4726 Wrong result of WHERE char1_col='A' 2021-05-21 14:40:16 +04:00
bd4cbb542d MCOL-4721 CHAR(1) is not collation-aware for GROUP/DISTINCT 2021-05-18 16:14:53 +04:00
78cca01dfa Merge pull request #1899 from tntnatbry/MCOL-4612
MCOL-4612 A subquery with a union for DECIMAL and BIGINT returns zeros.
2021-05-03 02:52:23 -04:00
71c16fcb56 Merge pull request #1909 from dhall-MariaDB/MCOL-4643
MCOL-4643 reset valOut after processing UDAF
2021-04-30 14:35:07 -05:00
f4e6939139 MCOL-4643 dev 5 reset valOut after processing UDAF
After a UDAF result has been inserted in the output stream, the valOut object needs to be reset to empty in preparation for the next value. Failing to do so may cause what should be a NULL value to erroneously take the last value inserted.
2021-04-30 10:57:40 -05:00
6d138a4963 Merge pull request #1905 from benthompson15/MCOL-4044
MCOL-4044: Add oracle mode functions.
2021-04-30 10:13:17 -05:00
4e9307fa6d MCOL-4612 A subquery with a union for DECIMAL and BIGINT returns zeros.
In this patch, we set the unioned type to a wide decimal, if any of the
numeric columns involved in the union operation have a precision > 18
(which is also possible for BIGINT/UBIGINT types) and <= 38.
2021-04-30 12:33:33 +00:00
1eea9f9e47 MCOL-4598: Fix the syslog setup script. Add syslog options for broken/non-syslog setup. 2021-04-29 16:34:53 -05:00
2aa5380d51 Remove global lock from OAMCache
Config now uses a single atomic variable to speed up its operations
Config has a special method to re-read a config file if it changed on disk
2021-04-23 15:41:29 +00:00
d6e88cd82e MCOL-4693 ColumnStore MTR tests: FUNCTION mcs192_db.CORR does not exist 2021-04-22 18:29:08 +04:00
870d672efb MCOL-4044: Add oracle mode functions. 2021-04-21 16:07:42 -05:00
c9b353e975 Simplify PMS connection entries configuration 2021-04-16 10:52:13 +00:00
757f8d00a5 A plugable PoorManProfiler singleton 2021-04-14 10:54:46 +00:00
362bfcd15e MCOL-4361 Replace pow(10.0, (double)scale) expressions with a static dictionary lookup. 2021-04-09 12:41:04 +04:00
912cbe641e Removing func_bitand.cpp
It was a dead code. It was not even a part of the soures in CMakeList.txt.

The bit AND operator implemententation resides in func_bitwise.cpp
together with all other bit operators and functions.
2021-04-08 11:34:22 +04:00
47b1ea1cf9 Merge pull request #1849 from mariadb-corporation/bar-develop-MCOL-4666
MCOL-4666 Empty set when using BIT OR and BIT AND functions in WHERE
2021-04-08 03:04:54 -04:00
a6a85d157d MCOL-4666 Empty set when using BIT OR and BIT AND functions in WHERE 2021-04-07 14:37:39 +04:00
1c54df4ab7 Merge pull request #1851 from mariadb-corporation/bar-develop-MCOL-4668
MCOL-4668 PERIOD_DIFF(dec_or_double1,dec_or_double2) is not as in InnoDB
2021-04-07 02:12:50 -04:00
f33b860580 Merge pull request #1841 from mariadb-corporation/bar-develop-MCOL-4614
A joint patch for MCOL-4614 and MCOL-4615 (decimal to string conversion)
2021-04-06 02:13:38 -04:00
9f41f574da MCOL-4668 PERIOD_DIFF(dec_or_double1,dec_or_double2) is not as in InnoDB 2021-04-06 08:15:18 +04:00
69911c2710 A joint patch for MCOL-4614, MCOL-4615, MCOL-4660 (decimal to string conversion)
This patch fixes:
- MCOL-4614 calShowPartitions() precision loss for huge narrow decimal
- MCOL-4615 GROUP_CONCAT() precision loss for huge narrow decimal
- MCOL-4660 Narow decimal to string conversion is inconsistent about zero integral

Changes:
- Implementing Row::getDecimalField()

- Removing double arithmetic from the code printing DECIMAL values
  in TypeHandlerXDecimal::format64() and GroupConcator::outputRow().
  Using Decimal::toString() instead.

- Rewriting Decimal::toStringTSInt64(). The old implementation
  was wrong, too complex and slow (used unnecessary memmove, memcpy).

An additional cleanup:
- Removing the ENGINE=COLUMNSTORE clause from tests for MCOL-4532 and MCOL-4640
  type_decimal.test is combinations-aware. It's run two times with
  default_storage_engine=MyISAM and default_storage_engine=COLUMNSTORE.
  So the CREATE TABLE statements should not specify the engine explicitly.
- Adding --disable_warnings in the old fixed test.
  We needed to suppress warnings when the MyISAM combination is being run.
  Previously the table was erroneously created with ENGINE=COLUMNSTORE
  even with the MyISAM combination run. So warning were not generated.
2021-04-05 16:36:19 +04:00
a3db5bde36 Merge pull request #1839 from mariadb-corporation/bar-develop-MCOL-4649
Fixing DOUBLE-to-[U]INT conversion (MCOL-4649, MCOL-4631, MCOL-4647)
2021-04-05 05:55:37 -04:00
a86f432f35 Fixing DOUBLE-to-[U]INT conversion (MCOL-4649, MCOL-4631, MCOL-4647)
Bugs fixed:
- MCOL-4649 CAST(double AS UNSIGNED) returns 0
- MCOL-4631 CAST(double AS SIGNED) returns 0 or NULL
- MCOL-4647 SEC_TO_TIME(double_or_float) returns a wrong result

Problems:
- The code in Func_cast_unsigned::getUintVal() and
  Func_cast_signed::getIntVal() did not properly check
  the double value to fit inside a uint64_t/int64_t range.
  So the corner cases:
  - numeric_limits<uint64_t>::max()-2 for uint64_t
  - numeric_limits<int64_t>::max() for int64_t
  produced unexpected results.

  The problem was in tests like this:
    if (value > (double) numeric_limits<int64_t>::max())
  A correct test would be:
    if (value >= (double) numeric_limits<int64_t>::max())

- The code in Func_sec_to_time::getStrVal() searched for the decimal
  dot character, assuming that the next character after the dot
  was the leftmost fractional digit.
  This assumption was wrong because huge double numbers use
  scientific notation. So for example in "2.5e-40" the
  digit "5" following the dot is NOT the leftmost fractional digit.
  Also, the code in Func_sec_to_time::getStrVal() was slow
  because of using non necessary to-string and from-string
  data conversion.
  Also, the code in Func_sec_to_time::getStrVal() evaluated
  the argument two times: using getStrVal() then using getIntVal().

Solution:
- Adding new classes TDouble and TLongDouble.
- Adding a few function templates to reuse the code easier.
- Moving the conversion code inside TDouble and TLongDouble
  methods toMCSSInt64Round() and toMCSUInt64Round().
- Reusing new classes and their methods in func_cast.cc and
  func_sec_to_time.cc.
2021-04-05 11:30:52 +04:00
05863a3fb5 Merge pull request #1808 from denis0x0D/MCOL-4566/rebuild_em_compressed
MCOL-4566: Add rebuildEM tool support to work with compressed files.
2021-04-02 17:52:34 +03:00
a0b46425dc Merge pull request #1787 from mariadb-corporation/bar-develop-like
MCOL-4498 LIKE is not collation aware
2021-04-02 11:57:06 +03:00
5d497e8821 MCOL-4566: Add rebuildEM tool support to work with compressed files.
* This patch adds rebuildEM tool support to work with compressed files.
* This patch increases a version of the file header.

Note: Default version of the `rebuildEM` tool was using very old API,
those functions are not present currently. So `rebuildEM` will not work with
files created without compression, because we cannot deduce some info which are
needed to create column extent.
2021-04-02 10:55:01 +03:00
f3766e40e4 Merge pull request #1836 from mariadb-corporation/bar-develop-MCOL-4618
A joint patch fixing MCOL-4618 and MCOL-4653:
2021-04-01 06:01:44 -04:00
e19096a91a A joint patch fixing MCOL-4618 and MCOL-4653:
- MCOL-4618 FLOOR(-9999.0) returns a bad result
- MCOL-4653 CEIL(negativeNarrowDecimal) returns a wrong result

Main changes:
a. Moving ROUND, CEIL, FLOOR related code into a new simple
   class template DecomposedDecimal, which is reused for
   64 and 128 bit decimal.
b. Using DecomposedDecimal in TDecimal64 and TDecimal128
   to implement ROUND, CEIL, FLOOR related methods.
c. Adding corresponding wrapper methods to the class Decimal.
d. Using new Decimal methods in Func_ceil and Func_floor.

Additional minor changed:
- Adding "explicit" to TSInt128 constructors to avoid
  hidden data type conversion and erroneous choice between
  64 vs 128 bit APIs when using Decimal.
  Now one can call constructors in this self explanatory way:
  - Decimal(TSInt128(some_int_value), scale, precision)
    to create a wide decimal
  - Decimal(TSInt64(some_int_value, scale, precision)
    to create a narrow decimal

  TODO:
  Consider changing
    Decimal(int64_t val, int8_t s, uint8_t p, const int128_t &val128 = 0)
  to
    Decimal(int64_t val, int8_t s, uint8_t p, const int128_t &val128)
  (or even removing this constructor) to disallow compilation of:
    Decimal(some_trivial_type_value, scale, precision)
2021-04-01 09:47:22 +04:00
c98d0c997d MCOL-4386: Update libmarias3 ref 2021-03-30 16:13:39 -05:00