mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00

Author	SHA1	Message	Date
Denis Khalikov	5f07828619	MCOL-5522 Properly process pm join result count. This patch: 1. Properly processes situation when pm join result count is exceeded. 2. Adds session variable 'columnstore_max_pm_join_result_count` to control the limit.	2024-01-10 18:16:39 +04:00
Gagan Goel	10f1a7abbc	MCOL-5480 LOAD DATA INFILE incorrectly loads values for MEDIUMINT datatype. Internal memory representation of MEDIUMINT datatype uses 24 bits. This is true for both MariaDB server as well as ColumnStore. MCS plugin code uses TypeHandlerSInt24 and TypeHandlerUInt24 classes to respectively convert the binary representation of the signed and unsigned MEDIUMINT values passed by the server to the plugin. The plugin then outputs the text representation of these values into an open file descriptor which is piped to cpimport for the final load into the MCS db files. The TypeHandlerXInt24 classes were earlier incorrectly using WriteBatchField::ColWriteBatchXInt32() functions which operate on a 4 byte buffer. This resulted in incorrect parsing of MEDIUMINT values. As a fix, we implement WriteBatchField::ColWriteBatchXInt24() functions which correctly handle the 24 bit input buffer used for MEDIUMINT datatype.	2023-05-19 18:30:52 -04:00
Gagan Goel	982db10f10	MCOL-5357 Fix TPC-DS query error "MCS-3009: Unknown column '.<colname>'". For the following query: select item from ( select item from (select a as item from t1) tt union all select item from (select a as item from t1) tt ) ttt; There is an if predicate in buildSimpleColFromDerivedTable() that compares the outermost query field name (ttt.item) to the returned column list of the inner query (tt.item) when building the returned column list of the outer most query. In the above query example, the inner query field name is an alias set in the inner most query and is set to "`tt`.`item`", while the outermost query field name is set to "item". The use of backticks "`" in the inner query alias is causing the execution to not enter the if block which creates the SimpleColumn for the outermost query field name. As a fix, we strip off the backticks from the inner query alias.	2023-05-02 15:43:10 -04:00
Gagan Goel	55d4214429	MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. (#2823 ) 1. Input and output RowGroup's used in GROUP_CONCAT classes are currently allocating a raw memory buffer of size equal to the actual width of the string datatype. As an example, for the following query: SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1; If col2 is a TEXT field with default width, the input RowGroup containing the target rows to be concatenated will assign 64kb of memory for every input row in the RowGroup. This is wasteful as actual field values in real workloads would be much smaller. We fix this by enabling the RowGroup to use the StringStore when the RowGroup contains long strings. 2. RowAggregation::initialize() allocates a memory buffer for a NULL row. The size of this buffer is equal to the row size for the output RowGroup. For the above scenario, using the default group_concat_max_len (which is a server variable that sets the maximum length of the GROUP_CONCAT string) value of 1mb, the buffer size would be (1mb + 64kb + some additional metadata). If the user sets group_concat_max_len to a higher value, say 3gb, this buffer size would be ~3gb. Now if the runtime initiates several instances of RowAggregation, total memory consumption by PrimProc could exceed the hardware memory limits causing the OS OOM to kill the process. We fix this problem by again enabling the StringStore for the NULL row allocation. 3. In the plugin code in buildAggregateColumn(), there is an integer overflow when the server group_concat_max_len variable (which is an uint32_t) is set to a value > INT32_MAX (such as 3gb) and is assigned to CalpontSystemCatalog::ColType::colWidth (which is an int32_t). As a short term fix, we saturate the assigned value to colWidth to INT32_MAX. Proper fix would be to upgrade CalpontSystemCatalog::ColType::colWidth to an uint32_t.	2023-04-22 00:43:29 +03:00
Leonid Fedorov	2f153184c3	Fixes of bugs from ASAN warnings, part one (#2796 )	2023-03-30 18:29:04 +03:00
Leonid Fedorov	56f2346083	Remove windows ifdefs	2023-03-02 15:59:42 +00:00
Roman Nozdrin	4d4e4ad30d	Merge pull request #2741 from mariadb-corporation/MDEV-25080-CS-dev MDEV-25080 Allow pushdown of queries involving UNIONs in outer select to ColumnStore	2023-02-28 11:23:50 +00:00
Andrey Piskunov	b6808c97f1	MCOL-4530: common conjuction top rewrite (#2673 ) Added logical transformation of the execplan::ParseTrees with the taking out the common factor in expression of the form "(A and B) or (A and C)" for the purposes of passing a TPCH 19 query. Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>	2023-02-27 19:23:19 +03:00
Gagan Goel	2f1f9c0ef0	MDEV-25080 Some fixes: 1. In TupleUnion::writeNull(), add the missing switch case for wide decimal with 16bytes column width. 2. MCOL-5432 Disable complete/partial pushdown of UNION operation if the query involves an ORDER BY or a LIMIT clause, until MCOL-5222 is fixed. Also add MTR test cases for this.	2023-02-27 06:38:31 -05:00
Gagan Goel	e4100928d1	MDEV-25080 DISABLE pushdown of SELECT_LEX_UNIT for the prepare phase of PS/SP statements.	2023-02-27 06:38:31 -05:00
Gagan Goel	86dcf92d56	MCOL-5215 Fix overflow of UNION operation involving DECIMAL datatypes. When a UNION operation involving DECIMAL datatypes with scale and digits before the decimal exceeds the currently supported maximum precision of 38, we throw an error to the user: "MCS-2060: Union operation exceeds maximum DECIMAL precision of 38". This is until MCOL-5417 is implemented where ColumnStore will have full parity with MariaDB server in terms of maximum supported DECIMAL precision and scale of 65 and 38 digits respectively.	2023-02-27 06:38:31 -05:00
Gagan Goel	8cdcae0d2f	MDEV-25080 Disable pushdown of SELECT_LEX_UNIT for CREATE VIEW statements.	2023-02-27 06:38:31 -05:00
Gagan Goel	45a779f743	MDEV-25080 Implement ColumnStore-side changes for pushdown of SELECT_LEX_UNITs.	2023-02-27 06:38:31 -05:00
Roman Nozdrin	ff534dba7f	MCOL-5384 This commit replaces shared pointer to CSC with CSC ctor that is cleaned up leaving a scope CSC default ctor was private b/c it must not allow to use CSC outside thread cache. However there are some places in the plugin code that need a standalone syscat that is cleaned up leaving the scope. The decision is to make the restriction mentioned organizational rather than syntactical.	2023-02-08 14:03:41 +00:00
Roman Nozdrin	1b51d265ed	MCOL-5400 Disable group by pushdown	2023-01-26 12:09:00 +00:00
Roman Nozdrin	ebe9bd0aa5	Merge pull request #2670 from denis0x0D/MCOL-5195 MCOL-5195 Correlated subquery with equi/non-equi scalar filter and join condition	2023-01-19 13:35:08 +03:00
Sam James	20b5dbb617	Add missing includes These seem to have all fallen out of a recent Boost update to 1.81 which dropped some internal includes. All of these uses within columnstore relied on these transitive includes, so explicitly include what we need to fix build. Signed-off-by: Sam James <sam@gentoo.org>	2023-01-17 01:18:41 +00:00
Leonid Fedorov	81f0334698	Connection resource cleaning by Karol Roslaniec	2023-01-13 16:35:12 +03:00
Leonid Fedorov	d42485656c	Fix clang 16 warnings for comfort build	2023-01-12 22:11:28 +03:00
David.Hall	53af74b027	MCOL-1170 Fix ANALYZE to not error (#2682 ) Analyze needs to be completed differently than a normal query. In server, when an ANALYZE is seen, it calls init_scan() immediatly followed by end_scan(). This leaves the sqlfrontendsession (ExeMgr) in a state where it expects to return rows. This patch fixes end_scan to clean this up via reads and writes to get everything back in synch. ANALYZE should display the number of rows to be displayed if the query were run normally. We have that information available, but no way to return it. A modification to server side to ask for that in the handler is required. This patch also includes a beautification of sqlfrontsessionthread.cpp since it looked bad. The important change is at line 774 if (!swallowRows) which short circuits the actual return of data	2023-01-09 13:59:26 -06:00
Roman Nozdrin	4313288a85	Merge 22.08.7 (#2678 ) * fix C API includes ColumnStore used to include server's mysql.h but link all tools with libmariadb.so There's no guarantee that this would work, even with workarounds it had in dbcon/mysql/sm.cpp Fix: * tools (linked with libmariadb.so) must include libmariadb's mysql.h * as a hack prevent service_thd_timezone.h from being loaded into tools, as it conflicts with libmariadb's mysql.h * server plugin must include server's mysql.h * also don't link every tool with libmariadb.so, link the helper library (liblibmysqlclient.so) that actually needs it, tools use this helper library, not libmariadb.so directly * do not link ha_columnstore.so with libmariadb.so this means some libraries have to be compiled twice - for tools with libmariadb.so and for plugin, without. * use system boost, if possible boost 1.71.0 is what ubuntu focal has, so let's start with that version. boost 1.77.0 is the first that supports c++20 * add dependency for generated header files errorids.h messageids.h see `3edd51610` * bump the version * MCOL-5322 This patch replaces boost::mutex with std::mutex b/c IMHO std::unique_lock::lock is less troublesome comparing with the boost alternative * MCOL-5310 This patch replaces move-assignment with copy-assignment to avoid memory corruption (#2661) * Bump VERSION to 22.08.7-1 * MCOL-5306 Re-read the config (Columnstore.xml) file if it was updated. The existing implementation of Config::makeConfig() factory method was returning a possibly stale config to the caller, without checking if the config file was updated since the last read. This bug triggered a scenario as described in MCOL-5306 where after a failover in an MCS cluster, the controllernode coordinates changed in the config file after failover and the existing mariadbd process was still using the old controllernode coordinates. This lead to failed network connection between mariadbd and the new controllernode. The change in this fix, however, is more generic and not just limited to this above scenario. * MCOL-5264 This patch replaces boost mutex locks with std analogs boost::uniqie_lock dtor calls a fancy unlock logic that throws twice. First if the mutex is 0 and second lock doesn't own the mutex. The first condition failure causes unhandled exception for one of the clients in DEC::writeToClient(). I was unable to find out why Linux can have a 0 mutex and replaced boost::mutex with std::mutex b/c stdlibc++ should be more stable comparing with boost. * MCOL-5311 Add timezone to jobList in subquerytransformer TimeZone was uninitialized in this scenario and led to undefined behavior. * patch_out_of_band Some changes made to 10.6-enterprise make a build using the out-of-band method of compiling columnstore not work. Out-of band means the source for the engine is not in the storage subdir of server, but rather in a stand alone directory. This is used by developers for easier develop work. In the case of out-of-band, INSTALL_LAYOUT is false in CMakeLists.txt * MCOL-5346 This patch forces TreeNode::getIntValue to use conversion for dict-based CHAR/VARCHAR and TEXT columns (#2657) Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com> * MCOL-5263 Add support to ROLLBACK when PP were restarted. DMLProc starts ROLLBACK when SELECT part of UPDATE fails b/c EM facility in PP were restarted. Unfortunately this ROLLBACK stuck if EM/PP are not yet available. DMLProc must have a t/o with re-try doing ROLLBACK. * MCOL-3561 This patch updates Connector code after MDEV-29988 * This commit applies the code style format Co-authored-by: Sergei Golubchik <serg@mariadb.com> Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com> Co-authored-by: David.Hall <david.hall@mariadb.com> Co-authored-by: Gagan Goel <gagan.nith@gmail.com> Co-authored-by: Denis Khalikov <dennis.khalikov@gmail.com>	2022-12-28 21:15:39 +03:00
Denis Khalikov	242bc75166	MCOL-5195 Correlated subquery with equi/non-equi scalar filter and join condition Disable check for correlated subqueries, basically those types of queries transforms to join (aggr(table2), table1), table2) and post join scalar filter.	2022-12-23 18:33:01 +03:00
Sergei Golubchik	246a4db8de	fix C API includes ColumnStore used to include server's mysql.h but link all tools with libmariadb.so There's no guarantee that this would work, even with workarounds it had in dbcon/mysql/sm.cpp Fix: * tools (linked with libmariadb.so) must include libmariadb's mysql.h * as a hack prevent service_thd_timezone.h from being loaded into tools, as it conflicts with libmariadb's mysql.h * server plugin must include server's mysql.h * also don't link every tool with libmariadb.so, link the helper library (liblibmysqlclient.so) that actually needs it, tools use this helper library, not libmariadb.so directly	2022-11-17 12:02:07 -06:00
Sergei Golubchik	21c3bbce16	do not link ha_columnstore.so with libmariadb.so this means some libraries have to be compiled twice - for tools with libmariadb.so and for plugin, without.	2022-11-17 11:53:42 -06:00
Roman Nozdrin	e71bb49267	Merge pull request #2617 from mariadb-corporation/spetrunia-tmp Add ability to compile against MariaDB with new cost model	2022-11-10 20:49:08 +03:00
Sergei Petrunia	b7fc8e4609	Add ability to compile against MariaDB with new cost model If MariaDB defines MARIADB_NEW_COST_MODEL, then ha_mcs::scan_time() has a different signature.	2022-11-10 18:16:44 +03:00
Leonid Fedorov	37fd915a08	Serg`s patch for develop-6 revised for develop https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/2614	2022-11-09 22:41:38 +00:00
Leonid Fedorov	88404f70f1	MCOL-5013: added terminated_by, enclosed_by, escaped_by for s3dataload	2022-11-03 15:46:08 +03:00
David.Hall	6d680ceb8c	MCOL-603 Add error message for sum(a=1) (#2597 ) * MCOL-603 Add error message for sum(a=1) This isn't currently supported, but rather than emitting an error, it asserted and crashed.	2022-11-01 10:13:40 -05:00
mariadb-AndreyPiskunov	d7f4ec73c5	Small fixes + test sorting	2022-10-31 14:56:32 +02:00
mariadb-AndreyPiskunov	315e4be2d8	First working attempt for json_arrayagg	2022-10-31 14:56:32 +02:00
mariadb-AndreyPiskunov	1714b75434	Non working attempt to do MCOL-5227	2022-10-31 14:56:32 +02:00
Roman Nozdrin	a0086bc561	Adding NULL flag into ConstString class	2022-10-21 18:13:18 +00:00
Gagan Goel	8a9b6b32e7	MCOL-5000 Disable ALTER TABLE statement execution on replicas. Exit early from the plugin execution of ALTER TABLE statements on the replica nodes. This is to prevent re-execution of syscat table population from the replica nodes which should only be executed once by the primary node in a CS cluster setup.	2022-10-14 17:20:55 +00:00
david.hall	28a12eda82	Fix up CmakeLists.txt A better way to fix the dependencies	2022-09-06 16:15:07 -05:00
david.hall	bcaf867731	Fix up cmake to build out of band The main CmakeLists.txt was using MY_CHECK_AND_SET_COMPILER_FLAG before the include. This works in-band with server because it was already included in server's CmakeLists.txt. dbcon/mysql included curl as a build dependency. We don't build curl. It's a lib dependency. Not sure why it works in-band. One wouldn't think it should.	2022-09-06 16:08:47 -05:00
Ziy1-Tan	cdd41f05f3	MCOL-785 Implement DISTRIBUTED JSON functions The following functions are created: Create function JSON_VALID and test cases Create function JSON_DEPTH and test cases Create function JSON_LENGTH and test cases Create function JSON_EQUALS and test cases Create function JSON_NORMALIZE and test cases Create function JSON_TYPE and test cases Create function JSON_OBJECT and test cases Create function JSON_ARRAY and test cases Create function JSON_KEYS and test cases Create function JSON_EXISTS and test cases Create function JSON_QUOTE/JSON_UNQUOTE and test cases Create function JSON_COMPACT/DETAILED/LOOSE and test cases Create function JSON_MERGE and test cases Create function JSON_MERGE_PATCH and test cases Create function JSON_VALUE and test cases Create function JSON_QUERY and test cases Create function JSON_CONTAINS and test cases Create function JSON_ARRAY_APPEND and test cases Create function JSON_ARRAY_INSERT and test cases Create function JSON_INSERT/REPLACE/SET and test cases Create function JSON_REMOVE and test cases Create function JSON_CONTAINS_PATH and test cases Create function JSON_OVERLAPS and test cases Create function JSON_EXTRACT and test cases Create function JSON_SEARCH and test cases Note: Some functions output differs from MDB because session variables that affects functions output,e.g JSON_QUOTE/JSON_UNQUOTE This depends on MCOL-5212	2022-08-30 22:22:23 +08:00
Roman Nozdrin	72e264e8ef	MCOL-5199 This patch solves the overal performance degradation introduced with a new way of char columns hashing in aggregation code The patch disables padding that forces hasher to calculate over the whole 2k buffer. This patch also moves hashing code into the common place where it belongs.	2022-08-24 19:07:06 +00:00
Leonid Fedorov	d02b3403b7	Changed function name, schema and params order to achieve columnstore_info.load_from_s3("<bucket>", "<file_name>", "<db_name>", "<table_name>");	2022-08-18 10:41:22 +00:00
Roman Nozdrin	56bbef62e6	Merge pull request #2406 from tntnatbry/MCOL-5021-dev MCOL-5021 AUX column implementation to improve DELETE performance.	2022-08-15 19:03:42 +03:00
David.Hall	2020f35e88	Mcol 5092 MODA uses wrong column width for some types (#2450 ) * MCOL-5092 Ensure column width is correct for datatype Change MODA return type to STRING Modify MODA to handle every numeric type * MCOL-5162 MODA to support char and varchar with collation support Fixes to the aggregate bit functions When we fixed the storage sign issue for MCOL-5092, it uncovered a problem in the bit aggregates (bit_and, bit_or and bit_xor). These aggregates should always return UBIGINT, but they relied on the type of the argument column, which gave bad results.	2022-08-11 15:16:11 -05:00
Gagan Goel	86df9a972c	MCOL-5021 Add prototype support for the AUX column in CREATE/DROP DDL commands, single and multi-value INSERTs, cpimport, and DELETE.	2022-08-05 14:40:49 -04:00
David.Hall	d3b57ec767	MCOL-4800 emit error if IN filter > 65535 entries (#2480 ) * MCOL-4800 emit error if IN filter > 65535 entries	2022-08-04 19:21:58 +03:00
David.Hall	08bef648b3	Mcol 5074 Case with In and aggregates asserts (#2435 ) * MCOL-5074 CASE with IN and aggregate asserts gwip-scsp wasn't set and buildPredicateItem() was called which assumes it is set. Added code to set properly in this case	2022-07-11 16:20:15 -05:00
david.hall	c71d11cb3f	Restore calonlinealter	2022-07-06 09:22:49 -05:00
Leonid Fedorov	242769d542	Mistype bug error handler fix	2022-07-05 18:48:30 +03:00
Roman Nozdrin	a3bc3de5f4	Merge pull request #2432 from mariadb-corporation/dataload-raw MCOL-5013: Load Data from S3 into Columnstore	2022-07-05 13:06:53 +03:00
Roman Nozdrin	38c4b973dd	Merge pull request #2421 from denis0x0D/MCOL-4778 [MCOL-4778] Return if we have an error in push_down_init.	2022-07-04 21:16:13 +03:00
Leonid Fedorov	110d9cfab5	Review fixes	2022-07-04 19:52:37 +03:00
Leonid Fedorov	f5b2a6885f	MCOL-5013: Load Data from S3 into Columnstore Introduced UDF and stored prodecure. usage: set columnstore_s3_key='<s3_key>'; set columnstore_s3_secret='<s3_secret>'; set columnstore_s3_region='region'; and then use UDF select columnstore_dataload("<tablename>", "<filename>", "<bucket>", "<db_name>"); for UDF db_name can be ommited, then current connection db will be used or stored function call calpontsys.columnstore_load_from_s3("<tablename>", "<filename>", "<bucket>", "<db_name>");	2022-07-04 19:52:37 +03:00

1 2 3 4 5 ...

991 Commits