1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-01 06:21:41 +03:00
Commit Graph

52 Commits

Author SHA1 Message Date
0747099456 Update on review comments 2023-11-30 01:47:13 +04:00
491ba6e0aa Segfault dummy fix
There are two different variables binaryArray and fixedSizeBinaryArray
initilized accordingly to columnData->type_id()

for columnData->type_id() equaled to  arrow::Type::type::FIXED_SIZE_BINARY
not initilized binaryArray was used below. The code, taking care about this two branches was commented.
This part looks quite messy and needs refactoring
2023-11-30 01:47:13 +04:00
6b8c3dd918 Fix arrow linkage on external project 2023-11-30 01:47:13 +04:00
fe597ec78c MCOL-5505 add parquet support for cpimport and add mcs_parquet_ddl and mcs_parquet_gen tools 2023-11-30 01:47:13 +04:00
7f9c624626 MCOL-5573 Fix cpimport truncation of TEXT columns.
1. Restore the utf8_truncate_point() function in utils/common/utils_utf8.h
that I removed as part of the patch for MCOL-4931.

2. As per the definition of TEXT columns, the default column width represents
the maximum number of bytes that can be stored in the TEXT column. So the
effective maximum length is less if the value contains multi-byte characters.
However, if the user explicitly specifies the length of the TEXT column in a
table DDL, such as TEXT(65535), then the DDL logic ensures that enough number
of bytes are allocated (upto a system maximum) to allow upto that many number
of characters (multi-byte characters if the charset for the column is multi-byte,
such as utf8mb3).
2023-09-20 12:23:22 -04:00
931f2b36a1 MCOL-4931 Make cpimport charset-aware. (#2938)
1. Extend the following CalpontSystemCatalog member functions to
   set CalpontSystemCatalog::ColType::charsetNumber, after the
   system catalog update to add charset number to calpontsys.syscolumn
   in MCOL-5005:
     CalpontSystemCatalog::lookupOID
     CalpontSystemCatalog::colType
     CalpontSystemCatalog::columnRIDs
     CalpontSystemCatalog::getSchemaInfo

2. Update cpimport to use the CHARSET_INFO object associated with the
   charset number retrieved from the system catalog, for a
   dictionary/non-dictionary CHAR/VARCHAR/TEXT column, to truncate
   long strings that exceed the target column character length.

3. Add MTR test cases.
2023-09-05 17:17:20 +03:00
b53c231ca6 MCOL-271 empty strings should not be NULLs (#2794)
This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.

Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
2023-03-30 21:18:29 +03:00
56f2346083 Remove windows ifdefs 2023-03-02 15:59:42 +00:00
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
01f3ceb437 replace header guards with #pragma once 2022-01-21 15:24:58 +00:00
5c5f103f98 MCOL-4839: Fix clang build (#2100)
* Fix clang build

* Extern C returned to plugin_instance

Co-authored-by: Leonid Fedorov <l.fedorov@mail.corp.ru>
2021-08-23 10:45:10 -05:00
237cad347f MCOL-4758 Limit LONGTEXT and LONGBLOB to 16MB (#1995)
MCOL-4758 Limit LONGTEXT and LONGBLOB to 16MB

Also add the original test case from MCOL-3879.
2021-07-05 02:09:41 -04:00
9608533d92 MCOL-4734 Compilation failure: MariaDB-10.6 + ColumnStore-develop
mcsconfig.h and my_config.h have the following
pre-processor definitions:

1. Conflicting definitions coming from the standard cmake definitions:
- PACKAGE
- PACKAGE_BUGREPORT
- PACKAGE_NAME
- PACKAGE_STRING
- PACKAGE_TARNAME
- PACKAGE_VERSION
- VERSION

2. Conflicting definitions of other kinds:
- HAVE_STRTOLL - this is a dirt in MariaDB headers.
  Should be fixed in the server code. my_config.h erroneously
  performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1".
  in some cases. The former is not CMake compatible style. The latter is.

3. Non-conflicting definitions:
  Otherwise, mcsconfig.h and my_config.h should be mutually compatible,
  because both are generated by cmake on the same host machine. So
  they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc.

Observations:
- It's OK to include both mcsconfig.h and my_config.h providing that we
  suppress duplicate definition of the above conflicting types #1 and #2.
- There is no a need to suppress duplicate definitions mentioned in #3,
  as they are compatible!
- my_sys.h and m_ctype.h must always follow a CMake configuation header,
  either my_config.h or mcsconfig.h (or both).
  They must never be included without any preceeding configuration header.

This change make sure that we resolve conflicts by:
- either disallowing inclusion of mcsconfig.h and my_config.h
  at the same time
- or by hiding conflicting definitions #1 and #2
  (with their later restoring).
- also, by making sure that my_sys.h and m_ctype.h always follow
  a CMake configuration file.

Details:
- idb_mysql.h can now only be included only after my_config.h
  An attempt to use idb_mysql.h with mcsconfig.h instead of
  my_config.h is caught by the "#error" preprocessor directive.

- mariadb_my_sys.h can now be only included after mcsconfig.h.
  An attempt to use mariadb_my_sys.h without mcscofig.h
  (e.g. with my_config.h) is also caught by "#error".

- collation.h now can now be included in two ways.
  It now has the following effective structure:

    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    //  Remember current conflicting definitions on the preprocessor stack
    //  Undefine current conflicting definitions
    #endif
    #include "mcsconfig.h"
    #include "m_ctype.h"
    #if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
    #    Restore conflicting definitions from the preprocessor stack
    #endif

  and can be included as follows:

  a. using only mcsconfig.h as a configuration header:

    // my_config.h must not be included so far
    #include "collation.h"

  b. using my_config.h as the first included configuration file:

    #define PREFER_MY_CONFIG_H // Force conflict resolution
    #include "my_config.h"     // can be included directly or indirectly
    ...
    #include "collation.h"

Other changes:

- Adding helper header files
     utils/common/mcsconfig_conflicting_defs_remember.h
     utils/common/mcsconfig_conflicting_defs_restore.h
     utils/common/mcsconfig_conflicting_defs_undef.h
  to perform conflict resolution easier.

- Removing `#include "collation.h"` from a number of files,
  as it's automatically included from rowgroup.h.

- Removing redundant `#include "utils_utf8.h"`.
  This change is not directly related to the problem being fixed,
  but it's nice to remove redundant directives for both collation.h
  and utils_utf8.h from all the files that do not really need them.
  (this change could probably have gone as a separate commit)

- Changing my_init() to MY_INIT(argv[0]) in the MCS services sources.
  After the fix of the complitation failure it appeared that ColumnStore
  services compiled with the debug build crash due to recent changes in
  safemalloc. The crash happened in strcmp() with `my_progname` as an argument
  (where my_progname is a mysys global variable). This problem should
  probably be fixed on the server side as well to avoid passing NULL.
  But, the majority of MariaDB executable programs also use MY_INIT(argv[0])
  rather than my_init(). So let's make MCS do like the other programs do.
2021-05-25 12:34:36 +04:00
5d497e8821 MCOL-4566: Add rebuildEM tool support to work with compressed files.
* This patch adds rebuildEM tool support to work with compressed files.
* This patch increases a version of the file header.

Note: Default version of the `rebuildEM` tool was using very old API,
those functions are not present currently. So `rebuildEM` will not work with
files created without compression, because we cannot deduce some info which are
needed to create column extent.
2021-04-02 10:55:01 +03:00
0e29b0b0f9 Fix -Wtype-limits 2020-11-17 15:03:10 +03:00
6f120d2637 MCOL-4328 MCS avoids chown() calls for files that are on S3
MCS now chowns created directories hierarchy not only files and
immediate parent directories

Minor changes to cpimport's help printout

cpimport's -f option is now mandatory with mode 2
2020-10-09 11:02:31 +00:00
328ae25650 MCOL-4328 There is a new option in both cpimport and cpimport.bin to asign
an owner for all data files created by cpimport

The patch consists of two parts: cpimport.bin changes, cpimport splitter
changes

cpimport.bin computes uid_t and gid_t early and propagates it down the stack
where MCS creates data files
2020-10-03 14:05:29 +00:00
eac7dab096 MCOL-4030: first commit of warning removals unneed const and missing virtual dtors. 2020-06-23 13:51:36 -05:00
1f3d1e6fd6 MCOL-3536 collation 2020-05-14 16:02:49 -05:00
e8837f7137 MCOL-3605 Fix Block Filling 2019-11-13 21:28:04 +00:00
04b7a6af6c MCOL-2276 Fix 2019-11-13 21:26:53 +00:00
3fef0f21d3 Remove vpj files
They shouldn't be here
2019-09-05 17:38:03 +01:00
811909aa72 Merge branch 'develop-1.2' into develop-merge-up-20190729 2019-07-29 12:19:26 +01:00
6bee5f9f61 MCOL-3395 Always clear dctnry cache on close
The dictionary cache needs to be cleared every time it is closed so that
on bulk insert we don't mix the cache between columns.
2019-07-03 08:46:43 +01:00
020b211bb7 Merge branch 'develop-1.2' into develop-merge-up-20190514 2019-05-14 13:58:33 +01:00
9dc33c4e82 Another try to cope with warnings under gcc 8.2. 2019-04-29 11:05:03 +03:00
784bbe09d4 Merge branch 'develop-1.2' into develop-merge-up-20190425 2019-04-25 10:27:59 +01:00
22c0c98e61 MCOL-498 Reduced number of blocks created for abbreviated extents
thus reduced IO load when creating a table.
    Uncompressed abbreviated segment and dicts aren't affected by
    this b/c CS'es system catalog uses uncompressed dict files. CS
    now doesn't work with empty dicts files.
2019-04-22 20:02:04 +03:00
bc3c780e35 MCOL-498 Revived unit tests for writeengine/shared and add new tests
for extent extention.
Added a getter, moved some methods from protected into public to use
with unit tests, e.g createFile, setPreallocSpace. Added code stub in
FileOp::oid2FileName to use with UT.
2019-04-22 20:02:00 +03:00
abf7ef80c2 MCOL-498 Changes made according with review suggestions.
Add more comments.
    Changed return value for HDFS'es fallocate.
    Removed unnecessary code in ColumnBufferCompressed::writeToFile
    Replaced Nulls with Empties in variable names.
2019-04-22 20:01:50 +03:00
cbdcdb9f10 MCOL-498 Add DBRootX.PreallocSpace setting in the XML. Dict files extents now contain a correct number of blocks available. 2019-04-22 20:01:43 +03:00
81fe7fa1a9 MCOL-498. Add the knob to disable segment|dict file preallocation. Dict files extension uses fallocate() if possible. 2019-04-22 20:01:14 +03:00
06f24df724 change signature array in a std::set ! lookup performance is now log(N). About 10x performance can be expected on cpimport containing varchars.
Signed-off-by: Patrice Linel <plinel@mendel-master2.cm.cluster>
2019-04-18 09:16:00 -04:00
176ef2f2c1 MCOL-1793 Add udafContext to the copy constructor of WindowFunctionColumn. 2018-11-23 12:42:29 -06:00
59d0a45da3 Merge branch 'develop-1.1' into 1.1-merge-up 2017-12-12 20:26:00 +00:00
37f673d121 Merge branch 'develop-1.0' into 1.0-merge-up 2017-11-30 15:09:11 +00:00
3d5bd3809c MCOL-444 Truncate UTF8 correctly
cpimport would truncate UTF8 data half way through a character which
would cause problems for functions using that data. This patch
calculates the correct truncation point when inserting the data.
2017-11-29 10:43:57 +00:00
932819ba23 Merge branch 'develop-1.1' into merge-up-dev 2017-11-24 11:10:09 +02:00
21ca6ca42f MCOL-1021 Fix dictionary de-duplication cache
The signatures were getting destroyed whilst processing before being
used in the dictionary de-duplication cache making the cache full of
empty strings. This fix resets the signature after insertion for the
cache.

This fix also lets the signature cache be read when there is 1000
entries in it.
2017-11-09 17:28:21 +00:00
01446d1e22 Reformat all code to coding standard 2017-10-26 17:18:17 +01:00
3330495a2e MCOL-777 Cleanup source
Clean out autotools and some other things from the source tree.
2017-08-07 15:59:56 +01:00
785e6c91bd MCOL-670 Fix UPDATE with BLOB/TEXT
* Don't cache > 8000 bytes during update
* Fix PrimProc case where token is used more than once
2017-04-19 22:45:23 +01:00
1892ac8681 MCOL-267 Bulk write & PrimProc fixes 2017-03-20 21:26:53 +00:00
093aa377e5 MCOL-267 multi-block support for PrimProc and bulk
* Adds multi-block bulk write support
* Adds PrimProc multi-block read support
* Allows the functions length() and hex() to work with BLOB columns
2017-03-20 18:32:24 +00:00
aea729fe7d MCOL-267 DML support
* DML writes for multi-block dictionary (blob) now works
* PrimProc fixed so that the first block in multi-block is read
correctly
* Performance optimisation (removed string copy into stack) for new
dictionary entries
2017-03-18 14:31:29 +00:00
424628349b Add CMake build tree files 2016-07-15 10:49:57 -05:00
d90af9496e Remove Makefile.in and update gitignore 2016-07-15 10:49:57 -05:00
22b7b3d1ef [MCOL-69] - autotools bootstrap only needed on new release version 2016-06-15 04:46:10 -04:00
185d1a780c [MCOL-69] Remove Makefile.in files (should be generated with autoreconf) 2016-05-30 07:48:12 -04:00
25093d4dd0 [MCOL-69] Fix autotools build process
Remove generated Makefiles
Update Makefile.am to specify RPATH as a subdirectory of --prefix
Remove configure artifacts such as config.log, config.h, etc
Remove unneeded backup files (files ending in tilde ~)
2016-05-30 07:41:56 -04:00