mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-29 08:21:15 +03:00

Author	SHA1	Message	Date
Aleksei Antipovskii	0ab03c7258	chore(codestyle): mark virtual methods as override	2025-02-21 20:01:34 +04:00
Daniel Black	88e80c1542	fix(utils): MCOL-5881 set/getThreadName use FreeBSD API (#3384 ) Taken from FreeBSD ports, this uses the FreeBSD APIs rather than the Linux specific prctl to change and retreive the thread names. Co-authored-by: Bernard Spil <brnrd@FreeBSD.org>	2025-02-20 16:32:59 +00:00
Sergey Zefirov	60dc7550f1	fix(group by, having): MCOL-5776: GROUP BY/HAVING closer to server's (#3371 ) This patch introduces an internal aggregate operator SELECT_SOME that is automatically added to columns that are not in GROUP BY. It "computes" some plausible value of the column (actually, last one passed). Along the way it fixes incorrect handling of HAVING being transferred into WHERE, window function handling and a bit of other inconsistencies.	2024-12-20 19:11:47 +00:00
Aleksei Antipovskii	e0a01c6cf4	Reapply "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 )" This reverts commit `a5c12b98d7`.	2024-12-11 12:02:24 +00:00
Serguey Zefirov	39a976c39a	fix(ubsan): MCOL-5844 - iron out UBSAN reports The most important fix here is the fix of possible buffer overrun in DATEFORMAT() function. A "%W" format, repeated enough times, would overflow the 256-bytes buffer for result. Now we use ostringstream to construct result and we are safe. Changes in date/time projection functions made me fix difference between us and server behavior. The new, better behavior is reflected in changes in tests' results. Also, there was incorrect logic in TRUNCATE() and ROUND() functions in computing the decimal "shift."	2024-12-10 20:30:58 +04:00
drrtuy	aa4bbc0152	feat(joblist,runtime): this is the first part of the execution model that produces a workload that can be predicted for a given query. * feat(joblist,runtime): this is the first part of the execution model that produces a workload that can be predicted for a given query. - forces to UM join converter to use a value from a configuration - replaces a constant used to control a number of outstanding requests with a value depends on column width - modifies related Columnstore.xml values	2024-12-03 22:18:21 +00:00
drrtuy	eaba4d33b4	fix(DEC):MCOL-5805,5808 to resolve UM-only node crash inside DEC when there is no local PP to send the local requests to. (#3350 ) * Revert "fix(DEC): MCOL-5602 fixing potentially endless loop in DEC (#3049)" This reverts commit `1d416bc6ed`. * fix(DEC):MCOL-5805,5808 to resolve UM-only node crash inside DEC when there is no local PP to send the local requests to.	2024-11-11 18:31:15 +00:00
Alexey Antipovsky	11136b3545	fix(PrimProc): MCOL-5651 Add a workaround to avoid choosing an incorrect TupleHashJoinStep as a joiner [stable-23.10] (#3331 ) * fix(PrimProc): MCOL-5651 Add a workaround to avoid choosing an incorrect TupleHashJoinStep as a joiner	2024-11-08 12:51:25 +00:00
drrtuy	0a71892d97	feat(rowgroup): this returns bits lost during cherry-pick. The bits lost caused the first RGData::serialize to crash a process	2024-11-08 16:28:51 +04:00
drrtuy	6f6e69815d	feat(bytestream,serdes): Distribute BS buf size data type change to avoid implicit data type narrowing	2024-11-08 16:28:51 +04:00
drrtuy	a947f7341c	feat(bytestream,serdes): BS buffer size type is uint64_t This necessary to handle 64bit RGData, that comes as a separate patch. The pair of patches would allow to have PM joins when SmallSide size > 4GB.	2024-11-08 16:28:51 +04:00
drrtuy	dc03621e9d	fix(rowgroup): RGData now uses uint64_t counter for the fixed sizes columns data buf. The buffer can utilize > 4GB RAM that is necessary for PM side join. RGData ctor uses uint32_t allocating data buffer. This fact causes implicit heap overflow.	2024-11-08 16:28:51 +04:00
drrtuy	6757535b6e	fix(join, UM, perf): UM join is multi-threaded now (#3286 ) * chore: UM join is multi-threaded now * fix(UMjoin): replace TR1 maps with stdlib versions	2024-09-04 18:56:35 +04:00
Leonid Fedorov	4b411b3968	MCOL-4696: get rid of boost::iequals	2024-08-21 21:35:52 +04:00
Aleksei Antipovskii	4aa281645e	feat(SM): MCOL-5785 S3Storage improvements Update libmarias3 fix build with the recent libmarias3 feat(SM): MCOL-5785 Add timeout options for S3Storage In some unfortunate situations StorageManager may get stuck on network operations. This commit adds the ability to set network timeouts which will help to ensure that the system is more responsive. feat(SM): MCOL-5785 Add smps & smkill tools * `smps` shows all active S3 network operations * `smkill` terminates S3 network operations NB! At the moment smkill is able to terminate operations that are stuck on retries, but not hang inside the libcurl call. In other words if you want to terminate all operations you should configure `connect_timeout` & `timeout` Install smkill & smps Add install for new binaries	2024-08-21 20:45:38 +04:00
Roman Nozdrin	a5c12b98d7	Revert "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 )" This reverts commit `c7caa4374f`.	2024-07-07 13:09:56 +00:00
drrtuy	d089171637	fix(compilation): add explicit comparison operators and explicitly cast a type (#3229 )	2024-06-29 07:53:54 +01:00
Sergey Zefirov	db4cb1d657	MCOL-4234 and MCOL 5772 cherry-picked into [stable 23.10] (#3226 ) * MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194) This patch fixes the problem in MCOL-4234 and also generally improves behavior of GROUP BY. It does so by introducing a "dummy" aggregate and by wrapping columns into it. This allows for columns that are not in GROUP BY to be used more freely, for example, in SELECT * FROM tbl GROUP BY col - all columns that are not "col" will be wrapped into an aggregate and query will proceed to execution. The dummy aggregate itself does nothing more than remember last value passed into it. There also an additional error message that tries to explain what types of expressions can be wrapped into an aggregate. * MCOL-5772: incorrect ORDER BY ordering for a columns not in GROUP BY (#3214) When ORDER BY column is not in GROUP BY, is not an aggregate and there is a SELECT column that is also not an aggregate, there was a problem: ordering happened on the SELECTed column, not ORDERed one. This patch fixes that particular problem and also performs some tidying around newly added aggregate. --------- Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com>	2024-06-28 00:31:53 +04:00
Denis Khalikov	9f4231f87f	MCOL-5708 Calculate precision and scale for constant decimal. (#3227 ) This patch calculates precision and scale for constant decimal value for SUM aggregation function.	2024-06-28 00:31:03 +04:00
Denis Khalikov	985cd94402	fix(join, disk-based): MCOL-5597: large side read errors (#3117 ) (#3225 ) The large side read errors mentioned there can be due to failure to close file stream properly. Some of the data may still reside in the file stream buffers, closing must flush it. The flush is an I/O operation and can fail, leading to partial write and subsequent partial read. This patch tries to provide better diagnostics. Co-authored-by: Sergey Zefirov <72864488+mariadb-SergeyZefirov@users.noreply.github.com>	2024-06-27 17:24:45 +04:00
Denis Khalikov	d6db3552c3	MCOL-5597 Rollback changes introduced for DJS. (#3224 ) This patch changes: 1. The number of buckets created on each split. 2. The heuristic which calculates the bucket size.	2024-06-27 17:22:11 +04:00
Leonid Fedorov	6c6fa7d5a4	MCOL-5328: PCRE based regexp regexp_substr regexp_instr regexp_replace [stable-23.10] (#3215 ) * MCOL-5328: PCRE based regexp regexp_substr regexp_instr regexp_replace * Add qa test for MCOL-5328 --------- Co-authored-by: Susil Behera <susil.behera@mariadb.com>	2024-06-27 14:20:08 +04:00
Leonid Fedorov	cce0f6ab0c	fix(cgroups)!: Containers memory limits for CI (#3108 ) (#3209 ) Limit test containers by memory, fix cgroup path inside the containers by introducing new ugly setting name --------- Co-authored-by: drrtuy <roman.nozdrin@mariadb.com> Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>	2024-06-20 10:48:20 +01:00
Denis Khalikov	e69dffc6f3	MCOL-5237 Proper handle DATETIME column for "ifnull" function. (#3201 )	2024-06-17 17:58:11 +04:00
Serguey Zefirov	97220501ed	Fixes MCOL-5700, Oracle mode test results This changeset contains fixes in Oracle mode tests and for the implementation of the CONCAT_ORACLE. Also, we harmonise our translation process with the recent changes in the server. Due to changed behavior of the server, some CREATE VIEW/EXPLAIN statements' results begun to output unexpected results and need to be fixed. Also, concatenation operation's name also changed. This lead to disabled func_concat_oracle test to be enabled to test it and it turned out that our implementation of this function was broken and need to be fixed too.	2024-04-15 19:35:47 +03:00
drrtuy	c7caa4374f	fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 ) * fix(aggregation, disk-based): MCOL-5689 this fixes disk-based distinct aggregation functions Previously disk-based distinct aggregation functions produced incorrect results b/c there was no finalization applied for previous generations stored on disk. * fix(aggregation, disk-based): Fix disk-based COUNT(DISTINCT ...) queries. (Case 2). (Distinct & Multi-Distinct, Single- & Multi-Threaded). * fix(aggregation, disk-based): Fix disk-based DISTINCT & GROUP BY queries. (Case 1). (Distinct & Multi-Distinct, Single- & Multi-Threaded). --------- Co-authored-by: Theresa Hradilak <theresa.hradilak@gmail.com> Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>	2024-03-24 18:04:37 +03:00
Leonid Fedorov	9a9b5f8036	fix(join,threadpool): MCOL-5565: MCOL-5636: MCOL-5645: port from develop-23.02 to [stable-23.10] (#3127 ) * fix(threadpool): MCOL-5565 queries stuck in FairThreadScheduler. (#3100) Meta Primitive Jobs, .e.g ADD_JOINER, LAST_JOINER stuck in Fair scheduler without out-of-band scheduler. Add OOB scheduler back to remedy the issue. * fix(messageqcpp): MCOL-5636 same node communication crashes transmiting PP errors to EM b/c error messaging leveraged socket that was a nullptr. (#3106) * fix(threadpool): MCOL-5645 errenous threadpool Job ctor implictly sets socket shared_ptr to nullptr causing sigabrt when threadpool returns an error (#3125) --------- Co-authored-by: drrtuy <roman.nozdrin@mariadb.com>	2024-02-14 14:56:07 +03:00
Leonid Fedorov	d0f657b01f	compilation fix for gcc12 on known gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105329	2024-01-04 11:40:56 +03:00
Leonid Fedorov	0c6876d8e4	perf(primproc) MCOL-5601: Initilize two fields once in ctor instead of calling makeConfig std::string fTmpDir = config::Config::makeConfig()->getTempFileDir(config::Config::TempDirPurpose::Aggregates); std::string fCompStr = config::Config::makeConfig()->getConfig("RowAggregation", "Compression");	2023-12-21 18:19:17 +03:00
Leonid Fedorov	a8f7777951	fix(linkage) link libm to libmarias3	2023-12-18 17:18:53 +03:00
Serguey Zefirov	a9ab71e675	MCOL-5625: Fixes json_query implementation Also extends func_json_value.test.	2023-12-13 16:15:26 +03:00
drrtuy	1d416bc6ed	fix(DEC): MCOL-5602 fixing potentially endless loop in DEC (#3049 )	2023-12-05 18:39:24 +03:00
drrtuy	63b032e3fd	fix(rowstorage): SplitMix64 PRNG implementation to replace stdlib MT PRNG that uses /dev/urandom guarded by spinlock (#3050 )	2023-12-05 18:30:31 +03:00
Sergey Zefirov	71f6a39078	fix(logging): Fixes MCOL-5599 where LIKE operator never finishes (#3048 ) This is a fix of logging subsystem, nothing else. The old code expanded an argument into string and advanced too little and, if expansion contained argument's index, it expanded it again. And again.	2023-12-03 20:17:43 +03:00
Serguey Zefirov	9e37ab82d8	MCOL-5607: JSON function use crashes query execution JSON functions were implemented violating an assumption of their pureness, as they should not have any state. This concrete patch fixes implementation of JSON_VALUE function.	2023-11-30 01:40:36 +04:00
Denis Khalikov	2b01ab1b15	fix(primproc,stringstore): MCOL-5597 Set length for `nullptr` string to 0. (#3034 )	2023-11-23 15:37:37 +03:00
Leonid Fedorov	1f71847e99	fix(packaging) dh_missing: warning are treated as errors for buildbot debians dh_missing: warning: Compatibility levels before 10 are deprecated (level 9 in use) dh_missing: warning: usr/lib/x86_64-linux-gnu/libmessageqcpp.a exists in debian/tmp but is not installed to anywhere dh_missing: warning: usr/lib/x86_64-linux-gnu/libpron.a exists in debian/tmp but is not installed to anywhere so do not install static libraries as targets on CMake	2023-10-04 13:20:24 -04:00
Leonid Fedorov	86c1c5d537	fix(rgdata)!: Fix assertion failure leading to disk-based aggregation failure The new added invariant checking that RGData knows the number of columns and fixed size columns was failing for disk-based aggregation workloads, leading them to provide a wrong result. (The assertion failure happened in RGData::getRow(uint32_t num, Row* row) which is called in the finalization of sub-aggregation results, necessary for merging part results. As the merging failed, duplicate results were output for disk-based aggregation queries. The assertion failure was caused by RGData::deserialize(ByteStream& bs, uint32_t defAmount) not setting rowSize and colCount if necessary (e.g. when the deserialization happens into a new, default RGData, which doesn't know anything about its structure yet. This is the case when the default constructor for RGData() is used, which sets rowSize and columnCount to 0 each. There are three code parts that make use of the default RGData() ctor. The fix is for the use in RowGroupStorage::loadRG(uint64_t rgid, std::unique_ptr<RGData>& rgdata, bool unlinkDump = false), where the default RGData object is used to directly deserialize a ByteStream into it. The deserialize method now checks if both rowSize and columnCount are 0 and if yes sets the read values from the ByteStream for both. We should probably check the other two code parts making use of the default RGData ctor, too. This happens in joinpartition.cpp and tuplejoiner.cpp. --------- Co-authored-by: Theresa Hradilak <34538290+phoeinx@users.noreply.github.com>	2023-09-30 00:02:31 +03:00
Sergey Zefirov	920607520c	feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support Adds a special column which helps to differentiate data and rollups of various depts and a simple logic to row aggregation to add processing of subtotals.	2023-09-26 17:01:53 +03:00
mariadb-AlexeyVorovich	fd94ab5042	chore(logging): move cgroup /cgroup version log from constructor to getTotalMemory to avoid duplicate log as constructor is called per query	2023-09-25 22:17:09 +03:00
Gagan Goel	7f9c624626	MCOL-5573 Fix cpimport truncation of TEXT columns. 1. Restore the utf8_truncate_point() function in utils/common/utils_utf8.h that I removed as part of the patch for MCOL-4931. 2. As per the definition of TEXT columns, the default column width represents the maximum number of bytes that can be stored in the TEXT column. So the effective maximum length is less if the value contains multi-byte characters. However, if the user explicitly specifies the length of the TEXT column in a table DDL, such as TEXT(65535), then the DDL logic ensures that enough number of bytes are allocated (upto a system maximum) to allow upto that many number of characters (multi-byte characters if the charset for the column is multi-byte, such as utf8mb3).	2023-09-20 12:23:22 -04:00
Gagan Goel	931f2b36a1	MCOL-4931 Make cpimport charset-aware. (#2938 ) 1. Extend the following CalpontSystemCatalog member functions to set CalpontSystemCatalog::ColType::charsetNumber, after the system catalog update to add charset number to calpontsys.syscolumn in MCOL-5005: CalpontSystemCatalog::lookupOID CalpontSystemCatalog::colType CalpontSystemCatalog::columnRIDs CalpontSystemCatalog::getSchemaInfo 2. Update cpimport to use the CHARSET_INFO object associated with the charset number retrieved from the system catalog, for a dictionary/non-dictionary CHAR/VARCHAR/TEXT column, to truncate long strings that exceed the target column character length. 3. Add MTR test cases.	2023-09-05 17:17:20 +03:00
mariadb-AlexeyVorovich	5b4f06bf0d	Logging of memory (#2930 ) * -logging of memory WIP * -better log for cgroup case * -fix log * -display in GIB * add log for freememory for non CGROUP (to be discussed) * test repeated log entries * -added counter for every 1000 call. effectivly 15m * Name logginng period and inrease it, clear config files from PR, add .gitignore --------- Co-authored-by: pgmabv99 <alexey.vorovich@gmail.com> Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>	2023-09-05 15:46:29 +03:00
drrtuy	765dd46b61	fix(pp-threadpool): the workaround for a stuck tests001 in CI (#2931 ) CI ocassionaly stuck running test001 b/c PP threadpool endlessly reschedules meta jobs, e.g. BATCH_PRIMITIVE_CREATE, which ByteStreams were somehow damaged or read out. Co-authored-by: Leonid Fedorov <leonid.fedorov@mariadb.com>	2023-08-18 00:02:31 +03:00
Theresa Hradilak	48562e41f9	feat(datatypes): MCOL-4632 and MCOL-4648, fix cast leads to NULL. Remove redundant cast. As C-style casts with a type name in parantheses are interpreted as static_casts this literally just changes the interpretation around (and forces an implicit cast to match the return value of the function). Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency. Make consistent with relation between BIGINTNULL and BIGINTEMPTYROW & make adapted cast behaviour due to NULL markers more intuitive. (After this change we can simply block the highest possible uint64_t value and if a cast results in it, print the next lower value (2^64 - 2). Previously, (2^64 - 1) was able to be printed, but (2^64 - 2) as being blocked by the UBIGINTNULL constant was not, making finding the appropiate replacement value to give out more confusing. Introduce MAX_MCS_UBIGINT and MIN_MCS_BIGINT and adapt casts. Adapt casting to BIGINT to remove NULL marker error. Add bugfix regression test for MCOL 4632 Add regression test for mcol_4648 Revert "Switch UBIGINTNULL and UBIGINTEMPTYROW constants for consistency." This reverts commit 83eac11b18937ecb0b4c754dd48e4cb47310f620. Due to backwards compatability issues. Refactor casting to MCS[U]Int to datatype functions. Update regression tests to include other affected datatypes. Apply formatting. Refactor according to PR review Remove redundant new constant, switch to using already existing constant. Adapt nullstring casting to EMPTYROW markers for backwards compatability. Adapt tests for backward compatability behaviour allowing text datatypes to be casted to EMPTYROW constant. Adapt mcol641-functions test according to bug fix. Update tests according to new expected behaviour. Adapt tests to new understanding of issue. Update comments/documentation for MCOL_4632 test. Adapt to new cast limit logic. Make bracketing consistent. Adapt previous regression test to new expected behaviour.	2023-08-11 13:00:30 +00:00
drrtuy	1a49a09af3	Merge pull request #2878 from denis0x0D/MCOL-5514_dev_1 MCOL-5514 Parallel disk join step	2023-07-25 14:32:13 +01:00
Leonid Fedorov	65cde8c894	feature: pron (#2908 ) * feature: Special dictionary, we can pass with session veriable to modify codepaths and behaviour for testing and debugging	2023-07-21 14:02:03 +03:00
Leonid Fedorov	8d06822be5	atomic stop flag	2023-07-12 18:17:13 +03:00
Leonid Fedorov	bab29ff495	Simpler Config	2023-07-12 18:15:26 +03:00
Denis Khalikov	2a66ae2ed1	MCOL-5514 Parallel disk join step.	2023-07-11 14:05:14 +03:00

1 2 3 4 5 ...

1171 Commits