1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-04-18 21:44:02 +03:00

18 Commits

Author SHA1 Message Date
drrtuy
8ae5a3da40
Fix/mcol 5787 rgdata buffer max size dev (#3325)
* fix(rowgroup): RGData now uses uint64_t counter for the fixed sizes columns data buf.
	The buffer can utilize > 4GB RAM that is necessary for PM side join.
	RGData ctor uses uint32_t allocating data buffer.
 	This fact causes implicit heap overflow.

* feat(bytestream,serdes): BS buffer size type is uint64_t
	This necessary to handle 64bit RGData, that comes as
	a separate patch. The pair of patches would allow to
	have PM joins when SmallSide size > 4GB.

* feat(bytestream,serdes): Distribute BS buf size data type change to avoid implicit data type narrowing

* feat(rowgroup): this returns bits lost during cherry-pick. The bits lost caused the first RGData::serialize to crash a process
2024-11-09 19:44:02 +00:00
Denis Khalikov
1c88a7fcd8 MCOL-5597 Rollback changes introduced for DJS.
This patch changes:
1. The number of buckets created on each split.
2. The heuristic which calculates the bucket size.
2024-04-15 19:37:29 +03:00
Sergey Zefirov
ebcf43a517
fix(join, disk-based): MCOL-5597: large side read errors (#3117)
The large side read errors mentioned there can be due to failure to
close file stream properly. Some of the data may still reside in the
file stream buffers, closing must flush it. The flush is an I/O
operation and can fail, leading to partial write and subsequent partial
read.

This patch tries to provide better diagnostics.
2024-02-09 22:25:43 +03:00
Denis Khalikov
2a66ae2ed1 MCOL-5514 Parallel disk join step. 2023-07-11 14:05:14 +03:00
Denis Khalikov
1f190a6e75 MCOL-5477 Disk join step improvement.
This patch:
1. Handles corner case when the bucket exceeded the memory limit, but we cannot redistribute the data in this bucket into new buckets based on a hash algorithm, because the rows have the same values.
2. Adds force option for disk join step.
3. Add a option to contol the depth of the partition tree.
2023-06-23 18:40:15 +03:00
Leonid Fedorov
56f2346083 Remove windows ifdefs 2023-03-02 15:59:42 +00:00
david.hall
3b6449842f Merge branch 'develop' into MCOL-4841
# Conflicts:
#	exemgr/main.cpp
#	oam/etc/Columnstore.xml.singleserver
#	primitives/primproc/primproc.cpp
2022-06-09 10:07:26 -05:00
Leonid Fedorov
c25ae4f378 Use external boost 1.78 2022-05-02 18:23:37 +00:00
David Hall
27dea733c5 MCOL4841 dev port run large join without OOM 2022-02-09 17:33:55 -06:00
Leonid Fedorov
04752ec546 clang format apply 2022-01-21 16:43:49 +00:00
Denis Khalikov
cc1c3629c5 MCOL-987 Add LZ4 compression.
* Adds CompressInterfaceLZ4 which uses LZ4 API for compress/uncompress.
* Adds CMake machinery to search LZ4 on running host.
* All methods which use static data and do not modify any internal data - become `static`,
  so we can use them without creation of the specific object. This is possible, because
  the header specification has not been modified. We still use 2 sections in header, first
  one with file meta data, the second one with pointers for compressed chunks.
* Methods `compress`, `uncompress`, `maxCompressedSize`, `getUncompressedSize` - become
  pure virtual, so we can override them for the other compression algos.
* Adds method `getChunkMagicNumber`, so we can verify chunk magic number
  for each compression algo.
* Renames "s/IDBCompressInterface/CompressInterface/g" according to requirement.
2021-07-06 18:04:37 +03:00
Alexey Antipovsky
475104e4d3 [MCOL-4709] Disk-based aggregation
* Introduce multigeneration aggregation

* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)

* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization

* Try to limit the qty of buckets at a low limit

* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory

* do not dump more than half of rowgroups to disk if generations are
  allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
  so the generations start more evenly

* Unify temp data location

* Explicitly create temp subdirectories
  whether disk aggregation/join are enabled or not
2021-06-06 16:09:15 +03:00
Patrick LeBlanc
a09a9d5d0f Mass substitution 'Corporaton' -> 'Corporation' 2019-08-07 14:43:25 -05:00
David Hall
3f2c753947 MCOL-1822-c final checkin 2019-03-05 09:33:39 -06:00
David Hall
c5b9ae11e5 MCOL-1822 add LONG DOUBLE support 2019-01-29 09:55:43 -06:00
David Hill
b7b98a3e1a MCOL-520 2018-09-25 11:32:56 -05:00
Andrew Hutchings
01446d1e22 Reformat all code to coding standard 2017-10-26 17:18:17 +01:00
david hill
f6afc42dd0 the begginning 2016-01-06 14:08:59 -06:00