1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-04-18 21:44:02 +03:00

134 Commits

Author SHA1 Message Date
Leonid Fedorov
43de58f80d
MCOL-5746 Cpimport: convert blob data from ascii hex when reads from STDIN (#3218)
Co-authored-by: Denis Khalikov <dennis.khalikov@gmail.com>
2024-06-27 14:24:16 +04:00
Sergei Golubchik
0f9d2d978d cpimport.bin doesn't really need libboost_program_options
linking with unused libraries creates a difference in dependencies
between --no-as-needed (gcc default) and --as-needed (default on
Fedora rpmbuild) builds.
2023-06-29 18:39:35 -04:00
Leonid Fedorov
2f153184c3
Fixes of bugs from ASAN warnings, part one (#2796) 2023-03-30 18:29:04 +03:00
Leonid Fedorov
56f2346083 Remove windows ifdefs 2023-03-02 15:59:42 +00:00
Leonid Fedorov
b936ed8b2e Fix some GCC-12 Build errors 2022-11-22 03:28:17 +03:00
Leonid Fedorov
37fd915a08 Serg`s patch for develop-6 revised for develop https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/2614 2022-11-09 22:41:38 +00:00
Gagan Goel
cbfdae3481 MCOL-5021 Code changes based on review feedback. 2022-08-05 14:40:50 -04:00
Gagan Goel
262cd5c501 MCOL-5021 Remove hard-coded values for data type, column width
and compression type for the AUX column, and replace them with
constants defined in the execplan namespace.
2022-08-05 14:40:49 -04:00
Gagan Goel
60eb0f86ec MCOL-5021 non-AUX column files are opened in read-only mode during
the DELETE operation. ColumnOp::readBlock() calls can cause writes
to database files when the active chunk list in ChunkManager is full.
Since non-AUX columns are read-only for the DELETE operation, we prevent
writes of compressed chunks and header for these columns by passing
an isReadOnly flag to CompFileData which indicates whether the column
is read-only or read-write.
2022-08-05 14:40:49 -04:00
Gagan Goel
35a3a93964 MCOL-5021 For the DELETE operation, empty magic values are only
written to database files for AUX column. Perform read-only operation
for other columns in the table to update the Casual Partitioning information.
2022-08-05 14:40:49 -04:00
Gagan Goel
86df9a972c MCOL-5021 Add prototype support for the AUX column in CREATE/DROP
DDL commands, single and multi-value INSERTs, cpimport, and
DELETE.
2022-08-05 14:40:49 -04:00
Leonid Fedorov
f5b2a6885f MCOL-5013: Load Data from S3 into Columnstore
Introduced UDF and stored prodecure.
usage:

set columnstore_s3_key='<s3_key>';
set columnstore_s3_secret='<s3_secret>';
set columnstore_s3_region='region';

and then use UDF
select columnstore_dataload("<tablename>", "<filename>", "<bucket>", "<db_name>");
for UDF db_name can be ommited, then current connection db will be used

or stored function
call calpontsys.columnstore_load_from_s3("<tablename>", "<filename>", "<bucket>", "<db_name>");
2022-07-04 19:52:37 +03:00
Roman Nozdrin
4c26e4f960 MCOL-4912 This patch introduces Extent Map index to improve EM scaleability
EM scaleability project has two parts: phase1 and phase2.
        This is phase1 that brings EM index to speed up(from O(n) down
        to the speed of boost::unordered_map) EM lookups looking for
        <dbroot, oid, partition> tuple to turn it into LBID,
        e.g. most bulk insertion meta info operations.
        The basis is boost::shared_managed_object where EMIndex is
        stored. Whilst it is not debug-friendly it allows to put a
        nested structs into shmem. EMIndex has 3 tiers. Top down description:
        vector of dbroots, map of oids to partition vectors, partition
        vectors that have EM indices.
        Separate EM methods now queries index before they do EM run.
        EMIndex has a separate shmem file with the fixed id
        MCS-shm-00060001.
2022-05-04 12:59:16 +00:00
benthompson15
2ec502aaf3
MCOL-4576: remove S3 options from cpimport. (#2328) 2022-04-01 09:18:34 -05:00
Leonid Fedorov
65252df4f6 C++20 fixes 2022-03-28 12:32:29 +00:00
Roman Nozdrin
2b4946f53a Revert "MCOL-4576: remove S3 options from cpimport. (#2307)"
This reverts commit 14c4840d53097a77cb7e5ce67d8c399f217fe32c.
2022-03-23 12:33:23 +00:00
benthompson15
14c4840d53
MCOL-4576: remove S3 options from cpimport. (#2307) 2022-03-21 09:54:39 -05:00
Serguey Zefirov
53b9a2a0f9 MCOL-4580 extent elimination for dictionary-based text/varchar types
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.

The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.

The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.
2022-03-02 23:53:39 +03:00
Gagan Goel
973e5024d8 MCOL-4957 Fix performance slowdown for processing TIMESTAMP columns.
Part 1:
 As part of MCOL-3776 to address synchronization issue while accessing
 the fTimeZone member of the Func class, mutex locks were added to the
 accessor and mutator methods. However, this slows down processing
 of TIMESTAMP columns in PrimProc significantly as all threads across
 all concurrently running queries would serialize on the mutex. This
 is because PrimProc only has a single global object for the functor
 class (class derived from Func in utils/funcexp/functor.h) for a given
 function name. To fix this problem:

   (1) We remove the fTimeZone as a member of the Func derived classes
   (hence removing the mutexes) and instead use the fOperationType
   member of the FunctionColumn class to propagate the timezone values
   down to the individual functor processing functions such as
   FunctionColumn::getStrVal(), FunctionColumn::getIntVal(), etc.

   (2) To achieve (1), a timezone member is added to the
   execplan::CalpontSystemCatalog::ColType class.

Part 2:
 Several functors in the Funcexp code call dataconvert::gmtSecToMySQLTime()
 and dataconvert::mySQLTimeToGmtSec() functions for conversion between seconds
 since unix epoch and broken-down representation. These functions in turn call
 the C library function localtime_r() which currently has a known bug of holding
 a global lock via a call to __tz_convert. This significantly reduces performance
 in multi-threaded applications where multiple threads concurrently call
 localtime_r(). More details on the bug:
   https://sourceware.org/bugzilla/show_bug.cgi?id=16145

 This bug in localtime_r() caused processing of the Functors in PrimProc to
 slowdown significantly since a query execution causes Functors code to be
 processed in a multi-threaded manner.

 As a fix, we remove the calls to localtime_r() from gmtSecToMySQLTime()
 and mySQLTimeToGmtSec() by performing the timezone-to-offset conversion
 (done in dataconvert::timeZoneToOffset()) during the execution plan
 creation in the plugin. Note that localtime_r() is only called when the
 time_zone system variable is set to "SYSTEM".

 This fix also required changing the timezone type from a std::string to
 a long across the system.
2022-02-14 14:12:27 -05:00
Leonid Fedorov
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
Leonid Fedorov
01f3ceb437 replace header guards with #pragma once 2022-01-21 15:24:58 +00:00
Roman Nozdrin
550d1cf1c4 MCOL-4858 This patch fixes HWM comparison for 16 columns and reduces boilerplate code for other column widths 2021-09-06 17:10:33 +00:00
Leonid Fedorov
5c5f103f98
MCOL-4839: Fix clang build (#2100)
* Fix clang build

* Extern C returned to plugin_instance

Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>
2021-08-23 10:45:10 -05:00
Denis Khalikov
cc1c3629c5 MCOL-987 Add LZ4 compression.
* Adds CompressInterfaceLZ4 which uses LZ4 API for compress/uncompress.
* Adds CMake machinery to search LZ4 on running host.
* All methods which use static data and do not modify any internal data - become `static`,
  so we can use them without creation of the specific object. This is possible, because
  the header specification has not been modified. We still use 2 sections in header, first
  one with file meta data, the second one with pointers for compressed chunks.
* Methods `compress`, `uncompress`, `maxCompressedSize`, `getUncompressedSize` - become
  pure virtual, so we can override them for the other compression algos.
* Adds method `getChunkMagicNumber`, so we can verify chunk magic number
  for each compression algo.
* Renames "s/IDBCompressInterface/CompressInterface/g" according to requirement.
2021-07-06 18:04:37 +03:00
Alexander Barkov
b3d6f62964 MCOL-4753 Performance problem in Typeless join 2021-06-10 09:26:26 +00:00
Roman Nozdrin
c6d0b46bc6
Merge pull request #1984 from denis0x0D/MCOL-4685_fix_warn
MCOL-4685: Fix GCC warnings.
2021-06-10 11:39:10 +03:00
Alexey Antipovsky
0dedb7e628 Fix compilation warnings 2021-06-09 16:51:00 +03:00
Denis Khalikov
2cd024145c MCOL-4685: Fix GCC warnings.
This patch fixes GCC warnings:
1.'const long int' and 'long unsigned int' [-Werror=sign-compare]
2. unused variable 'cf' [-Werror=unused-variable]
2021-06-09 14:33:32 +03:00
Denis Khalikov
caa02a383a MCOL-4685 Fix bug after rebase and add comments. 2021-06-07 13:00:11 +03:00
Denis Khalikov
606194e6e4 MCOL-4685: Eliminate some irrelevant settings (uncompressed data and extents per file).
This patch:
1. Removes the option to declare uncompressed columns (set columnstore_compression_type = 0).
2. Ignores [COMMENT '[compression=0] option at table or column level (no error messages, just disregard).
3. Removes the option to set more than 2 extents per file (ExtentsPreSegmentFile).
4. Updates rebuildEM tool to support up to 10 dictionary extent per dictionary segment file.
5. Adds check for `DBRootStorageType` for rebuildEM tool.
6. Renamed rebuildEM to mcsRebuildEM.
2021-06-03 14:44:33 +03:00
Sergey Zefirov
04d5b55c37 MCOL-4652 Fixes for wide-decimal support in bulk insert operations
Previously cpimport didn't send wide min/max-es talking to BRM
2021-05-31 18:40:57 +00:00
Alexander Barkov
9608533d92 MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop
mcsconfig.h and my_config.h have the following
pre-processor definitions:

1. Conflicting definitions coming from the standard cmake definitions:
- PACKAGE
- PACKAGE_BUGREPORT
- PACKAGE_NAME
- PACKAGE_STRING
- PACKAGE_TARNAME
- PACKAGE_VERSION
- VERSION

2. Conflicting definitions of other kinds:
- HAVE_STRTOLL - this is a dirt in MariaDB headers.
  Should be fixed in the server code. my_config.h erroneously
  performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1".
  in some cases. The former is not CMake compatible style. The latter is.

3. Non-conflicting definitions:
  Otherwise, mcsconfig.h and my_config.h should be mutually compatible,
  because both are generated by cmake on the same host machine. So
  they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc.

Observations:
- It's OK to include both mcsconfig.h and my_config.h providing that we
  suppress duplicate definition of the above conflicting types #1 and #2.
- There is no a need to suppress duplicate definitions mentioned in #3,
  as they are compatible!
- my_sys.h and m_ctype.h must always follow a CMake configuation header,
  either my_config.h or mcsconfig.h (or both).
  They must never be included without any preceeding configuration header.

This change make sure that we resolve conflicts by:
- either disallowing inclusion of mcsconfig.h and my_config.h
  at the same time
- or by hiding conflicting definitions #1 and #2
  (with their later restoring).
- also, by making sure that my_sys.h and m_ctype.h always follow
  a CMake configuration file.

Details:
- idb_mysql.h can now only be included only after my_config.h
  An attempt to use idb_mysql.h with mcsconfig.h instead of
  my_config.h is caught by the "#error" preprocessor directive.

- mariadb_my_sys.h can now be only included after mcsconfig.h.
  An attempt to use mariadb_my_sys.h without mcscofig.h
  (e.g. with my_config.h) is also caught by "#error".

- collation.h now can now be included in two ways.
  It now has the following effective structure:

    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    //  Remember current conflicting definitions on the preprocessor stack
    //  Undefine current conflicting definitions
    #endif
    #include "mcsconfig.h"
    #include "m_ctype.h"
    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    #    Restore conflicting definitions from the preprocessor stack
    #endif

  and can be included as follows:

  a. using only mcsconfig.h as a configuration header:

    // my_config.h must not be included so far
    #include "collation.h"

  b. using my_config.h as the first included configuration file:

    #define PREFER_MY_CONFIG_H // Force conflict resolution
    #include "my_config.h"     // can be included directly or indirectly
    ...
    #include "collation.h"

Other changes:

- Adding helper header files
     utils/common/mcsconfig_conflicting_defs_remember.h
     utils/common/mcsconfig_conflicting_defs_restore.h
     utils/common/mcsconfig_conflicting_defs_undef.h
  to perform conflict resolution easier.

- Removing `#include "collation.h"` from a number of files,
  as it's automatically included from rowgroup.h.

- Removing redundant `#include "utils_utf8.h"`.
  This change is not directly related to the problem being fixed,
  but it's nice to remove redundant directives for both collation.h
  and utils_utf8.h from all the files that do not really need them.
  (this change could probably have gone as a separate commit)

- Changing my_init() to MY_INIT(argv[0]) in the MCS services sources.
  After the fix of the complitation failure it appeared that ColumnStore
  services compiled with the debug build crash due to recent changes in
  safemalloc. The crash happened in strcmp() with `my_progname` as an argument
  (where my_progname is a mysys global variable). This problem should
  probably be fixed on the server side as well to avoid passing NULL.
  But, the majority of MariaDB executable programs also use MY_INIT(argv[0])
  rather than my_init(). So let's make MCS do like the other programs do.
2021-05-25 12:34:36 +04:00
Denis Khalikov
5d497e8821 MCOL-4566: Add rebuildEM tool support to work with compressed files.
* This patch adds rebuildEM tool support to work with compressed files.
* This patch increases a version of the file header.

Note: Default version of the `rebuildEM` tool was using very old API,
those functions are not present currently. So `rebuildEM` will not work with
files created without compression, because we cannot deduce some info which are
needed to create column extent.
2021-04-02 10:55:01 +03:00
Sergey Zefirov
de8ef1eb2d A test to investigate problems, without expected results
Debug logs

Debug logs - fix typo

Debug logs

Debug logs

debug logs

debug logs

debug logs

debug logs

debug logs

debug logs

debug logs

debug logs

Solution to test

Remove debug logs
2021-03-24 15:31:48 +03:00
Denis Khalikov
a2efa1efeb MCOL-4566: Extend CompressedDBFileHeader struct with new fields.
* This patch extends CompressedDBFileHeader struct with new fields:
  `fColumWidth`, `fColDataType`, which are necessary to rebuild extent map
  from the given file. Note: new fields do not change the memory
  layout of the struct, because the size is calculated as
  max(sizeof(CompressedDBFileHeader), HDR_BUF_LEN)).

* This patch changes API of some functions, by adding new function
  argument `colDataType` when needed, to be able to call `initHdr`
  function with colDataType value.
2021-03-05 22:15:34 +03:00
benthompson15
afa88866bb MCOL-4483: Fix and consolidate log files and cpimport logging. 2021-02-12 15:40:16 -06:00
Roman Nozdrin
5fce19df0a MCOL-4412 Introduce TypeHandler::getEmptyValueForType to return const ptr for an empty value
WE changes for SQL DML and DDL operations

Changes for bulk operations

Changes for scanning operations

Cleanup
2021-01-18 12:30:17 +00:00
Gagan Goel
f6b55c1e18 MCOL-4177 Add support for bulk insertion for wide decimals.
1. This patch adds support for wide decimals with/without scale
     to cpimport. In addition, INSERT ... SELECT and LDI are also
     now supported.
  2. Logic to compute the number of bytes to convert a binary
     representation in the buffer to a narrow decimal is also
     simplified.
2020-12-15 22:14:54 +00:00
Roman Nozdrin
58495d0d2f MCOL-4387 Convert dataconvert::decimalToString() into VDecimal and TSInt128 methods 2020-11-18 13:53:16 +00:00
Alexander Barkov
d5c6645ba1 Adding mcs_basic_types.h
For now it consists of only:

using int128_t = __int128;
using uint128_t = unsigned __int128;

All new privitive data types should go into this file in the future.
2020-11-18 13:53:15 +00:00
Alexander Barkov
129d5b5a0f MCOL-4174 Review/refactor frontend/connector code 2020-11-18 13:53:15 +00:00
Gagan Goel
d3bc68b02f MCOL-641 Refactor initial extent elimination support.
This commit also adds support in TupleHashJoinStep::forwardCPData,
although we currently do not support wide decimals as join keys.

Row estimation to determine large-side of the join is also updated.
2020-11-18 13:52:19 +00:00
Roman Nozdrin
8c02802ac1 MCOL-641 Fix the cpimport issue when tables contains wide-DECIMAL and other types. 2020-11-18 13:51:26 +00:00
Gagan Goel
74b64eb4f1 MCOL-641 1. Add support for int128_t in ParsedColumnFilter.
2. Set Decimal precision in SimpleColumn::evaluate().
3. Add support for int128_t in ConstantColumn.
4. Set IDB_Decimal::s128Value in buildDecimalColumn().
5. Use width 16 as first if predicate for branching based on decimal width.
2020-11-18 13:47:45 +00:00
Gagan Goel
9b714274db MCOL-641 1. Minor refactoring of decimalToString for int128_t.
2. Update unit tests for decimalToString.
3. Allow support for wide decimal in TupleConstantStep::fillInConstants().
2020-11-18 13:47:44 +00:00
Gagan Goel
824615a55b MCOL-641 Refactor empty value implementation in writeengine. 2020-11-18 13:47:44 +00:00
Gagan Goel
55afcd8890 MCOL-641 Basic extent elimination support for Decimal38. 2020-11-18 13:47:01 +00:00
Alexey Antipovsky
f4a6294c95 Fix ignored qualifiers warnings (-Wignored-qualifiers) 2020-11-17 15:03:10 +03:00
Roman Nozdrin
328ae25650 MCOL-4328 There is a new option in both cpimport and cpimport.bin to asign
an owner for all data files created by cpimport

The patch consists of two parts: cpimport.bin changes, cpimport splitter
changes

cpimport.bin computes uid_t and gid_t early and propagates it down the stack
where MCS creates data files
2020-10-03 14:05:29 +00:00
David Hall
dac4fccc48 MCOL-4306 don't compare string as signed int 2020-09-22 14:59:20 -05:00