mariadb-columnstore-engine

mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-09-11 08:50:45 +03:00

Author	SHA1	Message	Date
Sergey Zefirov	60dc7550f1	fix(group by, having): MCOL-5776: GROUP BY/HAVING closer to server's (#3371 ) This patch introduces an internal aggregate operator SELECT_SOME that is automatically added to columns that are not in GROUP BY. It "computes" some plausible value of the column (actually, last one passed). Along the way it fixes incorrect handling of HAVING being transferred into WHERE, window function handling and a bit of other inconsistencies.	2024-12-20 19:11:47 +00:00
Aleksei Antipovskii	23048e9749	fix(aggregation): remove double returnMemory()	2024-12-11 12:02:24 +00:00
Denis Khalikov	928678499a	fix(aggregation, RAM): MCOL-5715 Changes the second phase aggregation. (#3171 ) This patch changes the second phase aggregation pipeline - takes into account current memory consumption. Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com> Co-authored-by: drrtuy <roman.nozdrin@mariadb.com>	2024-12-11 12:02:24 +00:00
Aleksei Antipovskii	e0a01c6cf4	Reapply "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 )" This reverts commit `a5c12b98d7`.	2024-12-11 12:02:24 +00:00
Serguey Zefirov	39a976c39a	fix(ubsan): MCOL-5844 - iron out UBSAN reports The most important fix here is the fix of possible buffer overrun in DATEFORMAT() function. A "%W" format, repeated enough times, would overflow the 256-bytes buffer for result. Now we use ostringstream to construct result and we are safe. Changes in date/time projection functions made me fix difference between us and server behavior. The new, better behavior is reflected in changes in tests' results. Also, there was incorrect logic in TRUNCATE() and ROUND() functions in computing the decimal "shift."	2024-12-10 20:30:58 +04:00
drrtuy	aa4bbc0152	feat(joblist,runtime): this is the first part of the execution model that produces a workload that can be predicted for a given query. * feat(joblist,runtime): this is the first part of the execution model that produces a workload that can be predicted for a given query. - forces to UM join converter to use a value from a configuration - replaces a constant used to control a number of outstanding requests with a value depends on column width - modifies related Columnstore.xml values	2024-12-03 22:18:21 +00:00
drrtuy	eaba4d33b4	fix(DEC):MCOL-5805,5808 to resolve UM-only node crash inside DEC when there is no local PP to send the local requests to. (#3350 ) * Revert "fix(DEC): MCOL-5602 fixing potentially endless loop in DEC (#3049)" This reverts commit `1d416bc6ed`. * fix(DEC):MCOL-5805,5808 to resolve UM-only node crash inside DEC when there is no local PP to send the local requests to.	2024-11-11 18:31:15 +00:00
Alexey Antipovsky	11136b3545	fix(PrimProc): MCOL-5651 Add a workaround to avoid choosing an incorrect TupleHashJoinStep as a joiner [stable-23.10] (#3331 ) * fix(PrimProc): MCOL-5651 Add a workaround to avoid choosing an incorrect TupleHashJoinStep as a joiner	2024-11-08 12:51:25 +00:00
drrtuy	6f6e69815d	feat(bytestream,serdes): Distribute BS buf size data type change to avoid implicit data type narrowing	2024-11-08 16:28:51 +04:00
drrtuy	dc03621e9d	fix(rowgroup): RGData now uses uint64_t counter for the fixed sizes columns data buf. The buffer can utilize > 4GB RAM that is necessary for PM side join. RGData ctor uses uint32_t allocating data buffer. This fact causes implicit heap overflow.	2024-11-08 16:28:51 +04:00
drrtuy	ce86d1025a	feat(joblist): better dot graphs that represents joblist steps execution tree.	2024-09-26 18:51:49 +04:00
drrtuy	6757535b6e	fix(join, UM, perf): UM join is multi-threaded now (#3286 ) * chore: UM join is multi-threaded now * fix(UMjoin): replace TR1 maps with stdlib versions	2024-09-04 18:56:35 +04:00
Roman Nozdrin	a5c12b98d7	Revert "fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 )" This reverts commit `c7caa4374f`.	2024-07-07 13:09:56 +00:00
Sergey Zefirov	db4cb1d657	MCOL-4234 and MCOL 5772 cherry-picked into [stable 23.10] (#3226 ) * MCOL-4234: improve GROUP BY and ORDER BY interaction (#3194) This patch fixes the problem in MCOL-4234 and also generally improves behavior of GROUP BY. It does so by introducing a "dummy" aggregate and by wrapping columns into it. This allows for columns that are not in GROUP BY to be used more freely, for example, in SELECT * FROM tbl GROUP BY col - all columns that are not "col" will be wrapped into an aggregate and query will proceed to execution. The dummy aggregate itself does nothing more than remember last value passed into it. There also an additional error message that tries to explain what types of expressions can be wrapped into an aggregate. * MCOL-5772: incorrect ORDER BY ordering for a columns not in GROUP BY (#3214) When ORDER BY column is not in GROUP BY, is not an aggregate and there is a SELECT column that is also not an aggregate, there was a problem: ordering happened on the SELECTed column, not ORDERed one. This patch fixes that particular problem and also performs some tidying around newly added aggregate. --------- Co-authored-by: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com>	2024-06-28 00:31:53 +04:00
drrtuy	c7caa4374f	fix(aggregation, disk-based) MCOL-5691 distinct aggregate disk based (#3145 ) * fix(aggregation, disk-based): MCOL-5689 this fixes disk-based distinct aggregation functions Previously disk-based distinct aggregation functions produced incorrect results b/c there was no finalization applied for previous generations stored on disk. * fix(aggregation, disk-based): Fix disk-based COUNT(DISTINCT ...) queries. (Case 2). (Distinct & Multi-Distinct, Single- & Multi-Threaded). * fix(aggregation, disk-based): Fix disk-based DISTINCT & GROUP BY queries. (Case 1). (Distinct & Multi-Distinct, Single- & Multi-Threaded). --------- Co-authored-by: Theresa Hradilak <theresa.hradilak@gmail.com> Co-authored-by: Roman Nozdrin <rnozdrin@mariadb.com>	2024-03-24 18:04:37 +03:00
Denis Khalikov	2f29e22ba0	fix(DEC): MCOL-5637 Initialize a new bytestream before write to PS (#3118 )	2024-02-09 22:27:54 +03:00
Denis Khalikov	041eb2ec8a	fix(disk-based-join): MCOL-5626 Fix for race in DJS with outer join.	2023-12-18 17:24:27 +03:00
drrtuy	1d416bc6ed	fix(DEC): MCOL-5602 fixing potentially endless loop in DEC (#3049 )	2023-12-05 18:39:24 +03:00
Denis Khalikov	58e18eeb56	fix(aggregation): MCOL-5467 Add support for duplicate expressions in group by. (#3052 ) This patch adds support for duplicate expressions (builtin_functions) with one argument in select statement and group by statement.	2023-12-05 18:29:44 +03:00
Gagan Goel	320df831c6	MCOL-5572 Force the charset on the autoincrement column of (#2976 ) calpontsys.syscolumn syscat table to be latin1. This change is done in one of the ctors of pColStep which is initiated while building the job list from the execution plan.	2023-09-28 22:03:39 +03:00
Sergey Zefirov	920607520c	feat(runtime)!: MCOL-678 A "GROUP BY ... WITH ROLLUP" support Adds a special column which helps to differentiate data and rollups of various depts and a simple logic to row aggregation to add processing of subtotals.	2023-09-26 17:01:53 +03:00
Leonid Fedorov	8171e9da07	Fix rocky-8 vanilla compiler build (#2959 ) Co-authored-by: Leonid Fedorov <leonid.fedorov@mariad.com>	2023-09-20 04:04:08 +03:00
Denis Khalikov	add3a57e8d	MCOL-5539 Put table on small side if it was involved in prev.join. (#2945 )	2023-09-05 12:19:43 +03:00
Denis Khalikov	896e8dd769	MCOL-5522 Properly process pm join result count. (#2909 ) This patch: 1. Properly processes situation when pm join result count is exceeded. 2. Adds session variable 'columnstore_max_pm_join_result_count` to control the limit.	2023-08-04 16:55:45 +03:00
Denis Khalikov	2a66ae2ed1	MCOL-5514 Parallel disk join step.	2023-07-11 14:05:14 +03:00
Sergei Golubchik	a8be4a3787	compiler warnings like dbcon/joblist/batchprimitiveprocessor-jl.cpp:893:54: error: pointer used after ‘void operator delete [](void*, std::size_t)’ [-Werror=use-after-free] 893 \| joinResults.reset(new vector<uint32_t>[8192]); \| ^	2023-07-04 12:58:18 -04:00
Denis Khalikov	2aba28d855	Merge pull request #2851 from denis0x0D/MCOL-5477 MCOL-5477 Disk join step improvement.	2023-06-26 11:02:20 +03:00
Denis Khalikov	1f190a6e75	MCOL-5477 Disk join step improvement. This patch: 1. Handles corner case when the bucket exceeded the memory limit, but we cannot redistribute the data in this bucket into new buckets based on a hash algorithm, because the rows have the same values. 2. Adds force option for disk join step. 3. Add a option to contol the depth of the partition tree.	2023-06-23 18:40:15 +03:00
Denis Khalikov	024e6bd358	MCOL-5512 Fix for post join filter. This patch fixes certain situations where post join filter is not applying.	2023-06-09 11:15:05 +03:00
Roman Nozdrin	62dc392476	MCOL-5499 Enable ControlFlow for same node communication processing path to avoid DEC queue overloading (#2848 )	2023-06-07 15:41:59 +03:00
Leonid Fedorov	8f93fc3623	MCOL-5493: First portion of UBSan fixes (#2842 ) Multiple UB fixes	2023-06-02 17:02:09 +03:00
Gagan Goel	87eb875379	MCOL-5491 Enable StringStore for long strings in JSON_ARRAYAGG processing. This patch is the JSON_ARRAYAGG clone of the changes done in MCOL-5429 where we enabled usage of StringStore for long strings in GROUP_CONCAT() processing to reduce memory footprint of PrimProc and thus avoiding a potential OS triggered OOM crash.	2023-05-12 19:45:02 +00:00
Gagan Goel	0be1c3dc8f	MCOL-5429 Fix high memory consumption in GROUP_CONCAT() processing. 1. Input and output RowGroup's used in GROUP_CONCAT classes are currently allocating a raw memory buffer of size equal to the actual width of the string datatype. As an example, for the following query: SELECT col1, GROUP_CONCAT(col2) FROM t GROUP BY col1; If col2 is a TEXT field with default width, the input RowGroup containing the target rows to be concatenated will assign 64kb of memory for every input row in the RowGroup. This is wasteful as actual field values in real workloads would be much smaller. We fix this by enabling the RowGroup to use the StringStore when the RowGroup contains long strings. 2. RowAggregation::initialize() allocates a memory buffer for a NULL row. The size of this buffer is equal to the row size for the output RowGroup. For the above scenario, using the default group_concat_max_len (which is a server variable that sets the maximum length of the GROUP_CONCAT string) value of 1mb, the buffer size would be (1mb + 64kb + some additional metadata). If the user sets group_concat_max_len to a higher value, say 3gb, this buffer size would be ~3gb. Now if the runtime initiates several instances of RowAggregation, total memory consumption by PrimProc could exceed the hardware memory limits causing the OS OOM to kill the process. We fix this problem by again enabling the StringStore for the NULL row allocation. 3. In the plugin code in buildAggregateColumn(), there is an integer overflow when the server group_concat_max_len variable (which is an uint32_t) is set to a value > INT32_MAX (such as 3gb) and is assigned to CalpontSystemCatalog::ColType::colWidth (which is an int32_t). As a short term fix, we saturate the assigned value to colWidth to INT32_MAX. Proper fix would be to upgrade CalpontSystemCatalog::ColType::colWidth to an uint32_t.	2023-05-01 13:06:23 -04:00
Roman Nozdrin	4fe9cd64a3	Revert "No boost condition (#2822 )" (#2828 ) This reverts commit `f916e64927`.	2023-04-22 15:49:50 +03:00
Leonid Fedorov	f916e64927	No boost condition (#2822 ) This patch replaces boost primitives with stdlib counterparts.	2023-04-22 00:42:45 +03:00
Leonid Fedorov	c2d0fa24da	replace boost::shared_array<T> to std::shared_ptr<T[]>	2023-04-14 10:33:27 +00:00
Leonid Fedorov	a508b86091	remove boost/shared_array include	2023-04-14 09:42:50 +00:00
Leonid Fedorov	6c32c658d5	MCOL-5385: Delete RowGroup::setData and make Pointer ctor explicit (#2808 ) * Delete RowGroup::setData and make Pointer ctor explicit * some push_backs replaced with emplace_backs * Fixes of review notes	2023-04-13 03:55:30 +03:00
Leonid Fedorov	2e1394149b	MCOL-5464: Fixes of bugs from ASAN warnings, part one (#2792 ) * Fixes of bugs from ASAN warnings, part one * MQC as static library, with nifty counter for global map and mutex * Switch clang to 16 * link messageqcpp to execplan	2023-04-04 02:33:23 +03:00
Sergey Zefirov	b53c231ca6	MCOL-271 empty strings should not be NULLs (#2794 ) This patch improves handling of NULLs in textual fields in ColumnStore. Previously empty strings were considered NULLs and it could be a problem if data scheme allows for empty strings. It was also one of major reasons of behavior difference between ColumnStore and other engines in MariaDB family. Also, this patch fixes some other bugs and incorrect behavior, for example, incorrect comparison for "column <= ''" which evaluates to constant True for all purposes before this patch.	2023-03-30 21:18:29 +03:00
Roman Nozdrin	786b9da5b0	MCOL-5438 COUNT() in math causes SEGV	2023-03-09 20:35:38 +00:00
Leonid Fedorov	56f2346083	Remove windows ifdefs	2023-03-02 15:59:42 +00:00
Gagan Goel	2f1f9c0ef0	MDEV-25080 Some fixes: 1. In TupleUnion::writeNull(), add the missing switch case for wide decimal with 16bytes column width. 2. MCOL-5432 Disable complete/partial pushdown of UNION operation if the query involves an ORDER BY or a LIMIT clause, until MCOL-5222 is fixed. Also add MTR test cases for this.	2023-02-27 06:38:31 -05:00
Gagan Goel	86dcf92d56	MCOL-5215 Fix overflow of UNION operation involving DECIMAL datatypes. When a UNION operation involving DECIMAL datatypes with scale and digits before the decimal exceeds the currently supported maximum precision of 38, we throw an error to the user: "MCS-2060: Union operation exceeds maximum DECIMAL precision of 38". This is until MCOL-5417 is implemented where ColumnStore will have full parity with MariaDB server in terms of maximum supported DECIMAL precision and scale of 65 and 38 digits respectively.	2023-02-27 06:38:31 -05:00
Leonid Fedorov	d87206c3e4	Fix segfault in getLocalNetIfacesSins (#2713 )	2023-01-26 16:21:21 +03:00
Roman Nozdrin	c7c182ebd2	Merge pull request #2684 from drrtuy/MCOL-5385 MCOL-5385 This patch reduces RAM consumption and adds GROUP_CONCAT RA…	2023-01-18 11:58:47 +03:00
Leonid Fedorov	d42485656c	Fix clang 16 warnings for comfort build	2023-01-12 22:11:28 +03:00
Roman Nozdrin	d0eea0ffe8	MCOL-5385 This patch reduces RAM consumption and adds GROUP_CONCAT RAM accounting feature	2023-01-11 09:52:10 +00:00
Roman Nozdrin	15f65eff15	Merge pull request #2655 from denis0x0D/MCOL-5263_2 MCOL-5263 Add support to ROLLBACK when PP were restarted.	2022-12-13 21:24:01 +03:00
Denis Khalikov	d61780cab1	MCOL-5263 Add support to ROLLBACK when PP were restarted. DMLProc starts ROLLBACK when SELECT part of UPDATE fails b/c EM facility in PP were restarted. Unfortunately this ROLLBACK stuck if EM/PP are not yet available. DMLProc must have a t/o with re-try doing ROLLBACK.	2022-12-13 16:18:53 +03:00

1 2 3 4 5 ...

514 Commits