[MCOL-4709] Disk-based aggregation

* Introduce multigeneration aggregation * Do not save unused part of RGDatas to disk * Add IO error explanation (strerror) * Reduce memory usage while aggregating * introduce in-memory generations to better memory utilization * Try to limit the qty of buckets at a low limit * Refactor disk aggregation a bit * pass calculated hash into RowAggregation * try to keep some RGData with free space in memory * do not dump more than half of rowgroups to disk if generations are allowed, instead start a new generation * for each thread shift the first processed bucket at each iteration, so the generations start more evenly * Unify temp data location * Explicitly create temp subdirectories whether disk aggregation/join are enabled or not
2025-07-30 19:23:07 +03:00 · 2021-01-15 18:52:13 +03:00
parent 3537c0d635
commit 475104e4d3
24 changed files with 5932 additions and 906 deletions
--- a/utils/common/robin_hood.h
+++ b/utils/common/robin_hood.h
--- a/utils/common/threadnaming.cpp
+++ b/utils/common/threadnaming.cpp
@ -16,6 +16,7 @@
   MA 02110-1301, USA. */

 #include <sys/prctl.h>
+#include "threadnaming.h"

 namespace utils
 {
@ -23,4 +24,11 @@ namespace utils
    {
        prctl(PR_SET_NAME, threadName, 0, 0, 0);
    }
+
+    std::string getThreadName()
+    {
+      char buf[32];
+      prctl(PR_GET_NAME, buf, 0, 0, 0);
+      return std::string(buf);
+    }
 } // end of namespace
--- a/utils/common/threadnaming.h
+++ b/utils/common/threadnaming.h
@ -17,8 +17,11 @@
 #ifndef H_SETTHREADNAME
 #define H_SETTHREADNAME

+#include <string>
+
 namespace utils
 {
    void setThreadName(const char *threadName);
+    std::string getThreadName();
 } // end of namespace
 #endif