1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-08-05 16:15:50 +03:00
Commit Graph

14 Commits

Author SHA1 Message Date
Alexey Antipovsky
69fd36847d [MCOL-5213] Fix a rare IO error 2022-09-14 17:09:56 +03:00
Roman Nozdrin
a33597b073 MCOL-5198 This patch enables RowStorage to dump data on disk
using startNewGeneration if there is 50 Megs left free
2022-08-23 21:30:55 +00:00
Alexey Antipovsky
1be82f859b Randomly start a new generation if the free memory is less than 30% 2022-08-23 17:59:52 +00:00
Roman Nozdrin
fd9fe182d5 MCOL-5199 This patch solves the overal performance degradation introduced with a new way of char columns hashing
in aggregation code
The patch disables padding that forces hasher to calculate over the whole 2k buffer. This patch also moves hashing code
into the common place where it belongs.
2022-08-22 13:39:45 +00:00
Alexey Antipovsky
30429a7f6c Fix excessive memory consumption at the last stage of aggregation 2022-08-18 13:58:23 +03:00
Roman Nozdrin
5f485f40ca MCOL-5153 This patch replaces MDB collation aware hash function with the (#2487)
exact functionality that does not use MDB hash function.
This patch also takes a bit from Robin Hood hash map implementation forgotten
that reduces hash function collision rate.
2022-08-04 16:22:11 -05:00
Roman Nozdrin
eabca67c8d MCOL-5153 This increases the size of the multiplier in the guarding
check in RowAggStorage::increaseSize() so that it doesn't throw w/o a
reason
2022-07-07 08:58:05 +00:00
Leonid Fedorov
7c808317dc clang format apply 2022-02-11 12:24:40 +00:00
David.Hall
509f005be7 Mcol 4841 dev6 Handle large joins without OOM (#2155)
* MCOL-4846 dev-6 Handle large join results
Use a loop to shrink the number of results reported per message to something manageable.

* MCOL-4841 small changes requested by review

* Add EXTRA threads to prioritythreadpool
prioritythreadpool is configured at startup with a fixed number of threads available. This is to prevent thread thrashing. Since most of the time, BPP job steps are short lived, and a rescheduling mechanism exist if no threads are available, this works to keep cpu wastage to a minimum.

However, if a query or queries consume all the threads in prioritythreadpool and then block (due to the consumer not consuming fast enough) we can run out of threads and no work will be done until some threads unblock. A new mechanism allows for EXTRA threads to be generated for the duration of the blocking action. These threads can act on new queries. When all blocking is completed, these threads will be released when idle.

* MCOL-4841 dev6 Reconcile with changes in develop-6

* MCOL-4841 Some format corrections

* MCOL-4841 dev clean up some things based on review

* MCOL-4841 dev 6 ExeMgr Crashes after large join
This commit fixes up memory accounting issues in ExeMgr

* MCOL-4841 remove LDI change
Opened MCOL-4968 to address the issue

* MCOL-4841 Add fMaxBPPSendQueue to ResourceManager
This causes the setting to be loaded at run time (requires restart to accept a change) BPPSendthread gets this in it's ctor
Also rolled back changes to TupleHashJoinStep::smallRunnerFcn() that used a local variable to count locally allocated memory, then added it into the global counter at function's end. Not counting the memory globally caused conversion to UM only join way later than it should. This resulted in MCOL-4971.

* MCOL-4841 make blockedThreads and extraThreads atomic
Also restore previous scope of locks in bppsendthread. There is some small chance the new scope could be incorrect, and the performance boost is negligible. Better safe than sorry.
2022-02-09 21:38:32 +03:00
Alexey Antipovsky
2328f4ef2a [MCOL-4829] More accurate memory counting 2021-09-07 19:48:53 +03:00
Alexey Antipovsky
bf1640be65 [MCOL-4829] Compression for the temp disk-based aggregation files 2021-09-02 19:31:38 +03:00
Alexey Antipovsky
60495564b8 [MCOL-4709] Fix another UB in disk aggregation 2021-06-29 17:47:07 +03:00
Alexey Antipovsky
8a0b68f25e [MCOL-4709] Fix UB in disk aggregation 2021-06-28 20:07:23 +03:00
Alexey Antipovsky
475104e4d3 [MCOL-4709] Disk-based aggregation
* Introduce multigeneration aggregation

* Do not save unused part of RGDatas to disk
* Add IO error explanation (strerror)

* Reduce memory usage while aggregating
* introduce in-memory generations to better memory utilization

* Try to limit the qty of buckets at a low limit

* Refactor disk aggregation a bit
* pass calculated hash into RowAggregation
* try to keep some RGData with free space in memory

* do not dump more than half of rowgroups to disk if generations are
  allowed, instead start a new generation
* for each thread shift the first processed bucket at each iteration,
  so the generations start more evenly

* Unify temp data location

* Explicitly create temp subdirectories
  whether disk aggregation/join are enabled or not
2021-06-06 16:09:15 +03:00