1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-07-30 19:23:07 +03:00

[MCOL-4590] UNION Performance Improvement with the focus on the normalize functions.

This patch improves the runtime performance of UNION processing in CS, as reported JIRA issue MCOL 4590. The idea of the optimization is to infer the normalize seperate functions beforehand and perform the normalization individually later, instead of a huge switch body of all normalization. This patch also cover engineering optimization, removing the hotspots in UNION processing. After application of this patch, the normalize part takes only about 25% of the whole UNION query in our experiment avg case.

Signed-off-by: Jigao Luo <luojigao@outlook.com>
This commit is contained in:
Jigao Luo
2022-09-09 14:51:35 +02:00
parent e000236af7
commit 7f97a66184
2 changed files with 1260 additions and 760 deletions

File diff suppressed because it is too large Load Diff

View File

@ -41,6 +41,8 @@
namespace joblist
{
using normalizeFunctionsT = std::vector<std::function<void(const rowgroup::Row& in, rowgroup::Row* out, uint32_t col)>>;
class TupleUnion : public JobStep, public TupleDeliveryStep
{
public:
@ -122,8 +124,8 @@ class TupleUnion : public JobStep, public TupleDeliveryStep
};
void getOutput(rowgroup::RowGroup* rg, rowgroup::Row* row, rowgroup::RGData* data);
void addToOutput(rowgroup::Row* r, rowgroup::RowGroup* rg, bool keepit, rowgroup::RGData& data);
void normalize(const rowgroup::Row& in, rowgroup::Row* out);
void addToOutput(rowgroup::Row* r, rowgroup::RowGroup* rg, bool keepit, rowgroup::RGData& data, uint32_t& tmpOutputRowCount);
void normalize(const rowgroup::Row& in, rowgroup::Row* out, const normalizeFunctionsT& normalizeFunctions);
void writeNull(rowgroup::Row* out, uint32_t col);
void readInput(uint32_t);
void formatMiniStats();