1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-04-20 09:07:44 +03:00

24 Commits

Author SHA1 Message Date
Leonid Fedorov
56f2346083 Remove windows ifdefs 2023-03-02 15:59:42 +00:00
Leonid Fedorov
f5b2a6885f MCOL-5013: Load Data from S3 into Columnstore
Introduced UDF and stored prodecure.
usage:

set columnstore_s3_key='<s3_key>';
set columnstore_s3_secret='<s3_secret>';
set columnstore_s3_region='region';

and then use UDF
select columnstore_dataload("<tablename>", "<filename>", "<bucket>", "<db_name>");
for UDF db_name can be ommited, then current connection db will be used

or stored function
call calpontsys.columnstore_load_from_s3("<tablename>", "<filename>", "<bucket>", "<db_name>");
2022-07-04 19:52:37 +03:00
Roman Nozdrin
4c26e4f960 MCOL-4912 This patch introduces Extent Map index to improve EM scaleability
EM scaleability project has two parts: phase1 and phase2.
        This is phase1 that brings EM index to speed up(from O(n) down
        to the speed of boost::unordered_map) EM lookups looking for
        <dbroot, oid, partition> tuple to turn it into LBID,
        e.g. most bulk insertion meta info operations.
        The basis is boost::shared_managed_object where EMIndex is
        stored. Whilst it is not debug-friendly it allows to put a
        nested structs into shmem. EMIndex has 3 tiers. Top down description:
        vector of dbroots, map of oids to partition vectors, partition
        vectors that have EM indices.
        Separate EM methods now queries index before they do EM run.
        EMIndex has a separate shmem file with the fixed id
        MCS-shm-00060001.
2022-05-04 12:59:16 +00:00
benthompson15
2ec502aaf3
MCOL-4576: remove S3 options from cpimport. (#2328) 2022-04-01 09:18:34 -05:00
Roman Nozdrin
2b4946f53a Revert "MCOL-4576: remove S3 options from cpimport. (#2307)"
This reverts commit 14c4840d53097a77cb7e5ce67d8c399f217fe32c.
2022-03-23 12:33:23 +00:00
benthompson15
14c4840d53
MCOL-4576: remove S3 options from cpimport. (#2307) 2022-03-21 09:54:39 -05:00
Gagan Goel
973e5024d8 MCOL-4957 Fix performance slowdown for processing TIMESTAMP columns.
Part 1:
 As part of MCOL-3776 to address synchronization issue while accessing
 the fTimeZone member of the Func class, mutex locks were added to the
 accessor and mutator methods. However, this slows down processing
 of TIMESTAMP columns in PrimProc significantly as all threads across
 all concurrently running queries would serialize on the mutex. This
 is because PrimProc only has a single global object for the functor
 class (class derived from Func in utils/funcexp/functor.h) for a given
 function name. To fix this problem:

   (1) We remove the fTimeZone as a member of the Func derived classes
   (hence removing the mutexes) and instead use the fOperationType
   member of the FunctionColumn class to propagate the timezone values
   down to the individual functor processing functions such as
   FunctionColumn::getStrVal(), FunctionColumn::getIntVal(), etc.

   (2) To achieve (1), a timezone member is added to the
   execplan::CalpontSystemCatalog::ColType class.

Part 2:
 Several functors in the Funcexp code call dataconvert::gmtSecToMySQLTime()
 and dataconvert::mySQLTimeToGmtSec() functions for conversion between seconds
 since unix epoch and broken-down representation. These functions in turn call
 the C library function localtime_r() which currently has a known bug of holding
 a global lock via a call to __tz_convert. This significantly reduces performance
 in multi-threaded applications where multiple threads concurrently call
 localtime_r(). More details on the bug:
   https://sourceware.org/bugzilla/show_bug.cgi?id=16145

 This bug in localtime_r() caused processing of the Functors in PrimProc to
 slowdown significantly since a query execution causes Functors code to be
 processed in a multi-threaded manner.

 As a fix, we remove the calls to localtime_r() from gmtSecToMySQLTime()
 and mySQLTimeToGmtSec() by performing the timezone-to-offset conversion
 (done in dataconvert::timeZoneToOffset()) during the execution plan
 creation in the plugin. Note that localtime_r() is only called when the
 time_zone system variable is set to "SYSTEM".

 This fix also required changing the timezone type from a std::string to
 a long across the system.
2022-02-14 14:12:27 -05:00
Leonid Fedorov
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
Alexander Barkov
9608533d92 MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop
mcsconfig.h and my_config.h have the following
pre-processor definitions:

1. Conflicting definitions coming from the standard cmake definitions:
- PACKAGE
- PACKAGE_BUGREPORT
- PACKAGE_NAME
- PACKAGE_STRING
- PACKAGE_TARNAME
- PACKAGE_VERSION
- VERSION

2. Conflicting definitions of other kinds:
- HAVE_STRTOLL - this is a dirt in MariaDB headers.
  Should be fixed in the server code. my_config.h erroneously
  performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1".
  in some cases. The former is not CMake compatible style. The latter is.

3. Non-conflicting definitions:
  Otherwise, mcsconfig.h and my_config.h should be mutually compatible,
  because both are generated by cmake on the same host machine. So
  they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc.

Observations:
- It's OK to include both mcsconfig.h and my_config.h providing that we
  suppress duplicate definition of the above conflicting types #1 and #2.
- There is no a need to suppress duplicate definitions mentioned in #3,
  as they are compatible!
- my_sys.h and m_ctype.h must always follow a CMake configuation header,
  either my_config.h or mcsconfig.h (or both).
  They must never be included without any preceeding configuration header.

This change make sure that we resolve conflicts by:
- either disallowing inclusion of mcsconfig.h and my_config.h
  at the same time
- or by hiding conflicting definitions #1 and #2
  (with their later restoring).
- also, by making sure that my_sys.h and m_ctype.h always follow
  a CMake configuration file.

Details:
- idb_mysql.h can now only be included only after my_config.h
  An attempt to use idb_mysql.h with mcsconfig.h instead of
  my_config.h is caught by the "#error" preprocessor directive.

- mariadb_my_sys.h can now be only included after mcsconfig.h.
  An attempt to use mariadb_my_sys.h without mcscofig.h
  (e.g. with my_config.h) is also caught by "#error".

- collation.h now can now be included in two ways.
  It now has the following effective structure:

    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    //  Remember current conflicting definitions on the preprocessor stack
    //  Undefine current conflicting definitions
    #endif
    #include "mcsconfig.h"
    #include "m_ctype.h"
    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    #    Restore conflicting definitions from the preprocessor stack
    #endif

  and can be included as follows:

  a. using only mcsconfig.h as a configuration header:

    // my_config.h must not be included so far
    #include "collation.h"

  b. using my_config.h as the first included configuration file:

    #define PREFER_MY_CONFIG_H // Force conflict resolution
    #include "my_config.h"     // can be included directly or indirectly
    ...
    #include "collation.h"

Other changes:

- Adding helper header files
     utils/common/mcsconfig_conflicting_defs_remember.h
     utils/common/mcsconfig_conflicting_defs_restore.h
     utils/common/mcsconfig_conflicting_defs_undef.h
  to perform conflict resolution easier.

- Removing `#include "collation.h"` from a number of files,
  as it's automatically included from rowgroup.h.

- Removing redundant `#include "utils_utf8.h"`.
  This change is not directly related to the problem being fixed,
  but it's nice to remove redundant directives for both collation.h
  and utils_utf8.h from all the files that do not really need them.
  (this change could probably have gone as a separate commit)

- Changing my_init() to MY_INIT(argv[0]) in the MCS services sources.
  After the fix of the complitation failure it appeared that ColumnStore
  services compiled with the debug build crash due to recent changes in
  safemalloc. The crash happened in strcmp() with `my_progname` as an argument
  (where my_progname is a mysys global variable). This problem should
  probably be fixed on the server side as well to avoid passing NULL.
  But, the majority of MariaDB executable programs also use MY_INIT(argv[0])
  rather than my_init(). So let's make MCS do like the other programs do.
2021-05-25 12:34:36 +04:00
benthompson15
afa88866bb MCOL-4483: Fix and consolidate log files and cpimport logging. 2021-02-12 15:40:16 -06:00
Roman Nozdrin
328ae25650 MCOL-4328 There is a new option in both cpimport and cpimport.bin to asign
an owner for all data files created by cpimport

The patch consists of two parts: cpimport.bin changes, cpimport splitter
changes

cpimport.bin computes uid_t and gid_t early and propagates it down the stack
where MCS creates data files
2020-10-03 14:05:29 +00:00
Gagan Goel
2ba9263df4 Silence -Werror=implicit-fallthrough compiler errors - Patch from Monty.
The patch also fixes some potential bugs due to missing break
statements.
2020-06-26 12:32:57 -04:00
David Hall
06e50e0926 MCOL-3536 collation 2020-05-26 12:42:11 -05:00
David Hall
1f3d1e6fd6 MCOL-3536 collation 2020-05-14 16:02:49 -05:00
Andrew Hutchings
8633859dd4 MCOL-3514 Add support for S3 to cpimport
cpimport now has the ability to use libmarias3 to read an object from an
S3 bucket instead of a file on local disk.

This also moves libmarias3 to utils/libmarias3.
2019-09-24 10:31:22 +01:00
Andrew Hutchings
7b006b6e74 MCOL-104 Remove InfiniDB Engine
Remove the InfiniDB engine which is a duplicate of the ColumnStore
engine.
2019-08-09 11:51:55 +01:00
Andrew Hutchings
5e4f1b9933
Merge branch 'develop' into MCOL-265 2019-06-10 13:58:03 +01:00
Roman Nozdrin
e12a2acd53 MCOL-537 Regression test doesn't tolerate 'failed' in stderr, stdout.
I reformulate the messages.

    Changed version in preprocessor conditions to avoid compilation
    warnings in Debian 9.

    Disabled sign-compare check for generated files in DML/DDL.
2019-05-20 18:30:52 +03:00
Roman Nozdrin
b2436502cb MCOL-537 Enabled -Wno-unused-result for OAM code.
Fixed pragmas that disables compilation checks.

    DDLProc now returns an error if it couldn't cwd.

    Use either auto_ptr or unique_ptr depending on GCC version.
2019-05-08 19:44:01 +03:00
Roman Nozdrin
7e2cb05624 MCOL-537 There are no CS-specific warnings building with gcc 8.2. 2019-05-07 16:00:05 +03:00
Gagan Goel
e89d1ac3cf MCOL-265 Add support for TIMESTAMP data type 2019-04-23 00:00:09 -04:00
Andrew Hutchings
01446d1e22 Reformat all code to coding standard 2017-10-26 17:18:17 +01:00
david hill
7d8de28b43 MCOL-59, change calpont.xml 2016-06-22 16:00:00 -05:00
david hill
f6afc42dd0 the begginning 2016-01-06 14:08:59 -06:00