From 86c1c5d53700fef665a6c7fe16f608c49256d965 Mon Sep 17 00:00:00 2001 From: Leonid Fedorov <79837786+mariadb-LeonidFedorov@users.noreply.github.com> Date: Sat, 30 Sep 2023 00:02:31 +0300 Subject: [PATCH] fix(rgdata)!: Fix assertion failure leading to disk-based aggregation failure The new added invariant checking that RGData knows the number of columns and fixed size columns was failing for disk-based aggregation workloads, leading them to provide a wrong result. (The assertion failure happened in RGData::getRow(uint32_t num, Row* row) which is called in the finalization of sub-aggregation results, necessary for merging part results. As the merging failed, duplicate results were output for disk-based aggregation queries. The assertion failure was caused by RGData::deserialize(ByteStream& bs, uint32_t defAmount) not setting rowSize and colCount if necessary (e.g. when the deserialization happens into a new, default RGData, which doesn't know anything about its structure yet. This is the case when the default constructor for RGData() is used, which sets rowSize and columnCount to 0 each. There are three code parts that make use of the default RGData() ctor. The fix is for the use in RowGroupStorage::loadRG(uint64_t rgid, std::unique_ptr& rgdata, bool unlinkDump = false), where the default RGData object is used to directly deserialize a ByteStream into it. The deserialize method now checks if both rowSize and columnCount are 0 and if yes sets the read values from the ByteStream for both. We should probably check the other two code parts making use of the default RGData ctor, too. This happens in joinpartition.cpp and tuplejoiner.cpp. --------- Co-authored-by: Theresa Hradilak <34538290+phoeinx@users.noreply.github.com> --- utils/rowgroup/rowgroup.cpp | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/utils/rowgroup/rowgroup.cpp b/utils/rowgroup/rowgroup.cpp index 344e51f7d..5d05b9cdc 100644 --- a/utils/rowgroup/rowgroup.cpp +++ b/utils/rowgroup/rowgroup.cpp @@ -414,9 +414,17 @@ void RGData::deserialize(ByteStream& bs, uint32_t defAmount) uint32_t rowSizeTemp; bs >> colCountTemp; bs >> rowSizeTemp; - if (rowSize != 0) + if (rowSize != 0 || columnCount != 0) { - idbassert(colCountTemp == columnCount && rowSize == rowSizeTemp); + idbassert(rowSize == rowSizeTemp && colCountTemp == columnCount); + } + else + { + // if deserializing into an empty RGData created by default constructor + // which sets columnCount = 0 and rowSize = 0, set columnCount and rowSize + // from deserialized bytestream + columnCount = colCountTemp; + rowSize = rowSizeTemp; } rowData.reset(new uint8_t[std::max(amount, defAmount)]); buf = bs.buf();