1
0
mirror of https://github.com/mariadb-corporation/mariadb-columnstore-engine.git synced 2025-04-18 21:44:02 +03:00
mariadb-columnstore-engine/primitives/primproc/batchprimitiveprocessor.h
Patrick LeBlanc 1eaa83d852 Squash merge of the multithreaded PM join code.
Squashed commit of the following:

commit fe4cc375faf1588e30471062f78403e81229cd02
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Fri Nov 1 13:38:11 2019 -0400

    Added some code comments to the new join code.

commit a7a82d093be4db3dfb44d33e4f514fd104b25f71
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Fri Nov 1 13:17:47 2019 -0400

    Fixed an error down a path I think is unused.

commit 4e6c7c266a9aefd54c384ae2b466645770c81a5d
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Fri Nov 1 13:12:12 2019 -0400

    std::atomic doesn't exist in C7, -> boost::atomic.

commit ed0996c3f4548fff0e19d43852d429ada1a72510
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Wed Oct 16 12:47:32 2019 -0500

    Addition to the previous fix (join dependency projection).

commit 97bb806be9211e4688893460437f539c46f3796f
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Tue Oct 15 15:22:09 2019 -0500

    Found and fixed a bad mem access, which may have been there for 8 years.

commit d8b0432d2abd70f28de5276daad758c494e4b04b
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Tue Oct 15 14:04:48 2019 -0500

    Minor optimization in some code I happened to look at.

commit b6ec8204bf71670c7a8882464289e700aa5f7e33
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Tue Oct 15 14:04:11 2019 -0500

    Fixed a compiler warning.

commit 0bf3e5218f71d92460ddc88090e3af77ecf28c35
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Tue Oct 15 10:11:09 2019 -0500

    Undid part of the previous commit.

commit 5dfa1d23980e245c77c1644015b553aa4bcdf908
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Mon Oct 14 18:00:21 2019 -0500

    Proofread the diff vs base, added some comments, removed some debugging stuff.

commit 411fd955ebbae97ddab210a7b17fe5708538001d
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Fri Oct 11 13:55:39 2019 -0500

    If a dev build (SKIP_OAM_INIT), made postConfigure exit before trying
    to start the system, because that won't work.

commit 634b1b8a7340b55fcaee045fd6d00b3e3a9269fa
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Mon Sep 30 14:55:45 2019 -0500

    Reduced crit section of BPP::addToJoiner a little.

commit 31f30c64dd95942f2c7a247cc81feaa5933c1a07
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Wed Sep 18 11:09:27 2019 -0500

    Checkpointing.  make the add joiner stuff free tmp mem quickly.

commit 9b7e788690546af7ddc4c921a0ab441ee9a8df02
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Wed Sep 18 10:38:57 2019 -0500

    Checkpoint.  Removed tmp hardcoding of bucket count.

commit fda4d8b7fb30d0431dc15e473042abb3d8121b19
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Wed Sep 18 10:20:09 2019 -0500

    Checkpoint.  Adjusted unproductive loop wait time.

commit 7b9a67df7d192f240e9e558e6e66c7aa9f1e8687
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Wed Sep 18 10:10:43 2019 -0500

    Checkpointing add'l optimizations.

    If we promote bpp::processorThreads / bucket count to a power of 2, we can
    use a bitmask instead of a mod operation to decide a bucket.

    Also, boosted utilization by not waiting for a bucket lock to become free.
    There are likely more gains to be had there with a smarter strategy.
    Maybe have each thread generate a random bucket access pattern to reduce
    chance of collision.  TBD.

commit abe7dab8661b5120f6ee268abc005dd66cd643e2
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Tue Sep 17 16:15:51 2019 -0500

    Multithreaded PM hash table construction likely works here.

    A couple more fixes.
     - missed a mod after a hash in one place.
     - Made the PoolAllocator thread safe (small degree of performance hit
       there in threaded env).  May need to circle back to the table
       construction code to eliminate contention for the allocators instead.

commit ab308762fbd873dbf246a6d1574223087cd0d5f6
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Tue Sep 17 12:14:14 2019 -0500

    Checkpointing.  Did some initial testing, fixed a couple things.

    Not done testing yet.

commit 3b161d74fa859edb8b5ba84bb905e586ac0586e6
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Tue Sep 17 11:24:55 2019 -0500

    Checkpointing.  First cut of multithreaded PM join table building.

    Builds but is untested.

commit cb7e6e1c2761fc6c33b3b1c6b6cda488d7792bca
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Mon Sep 16 13:03:50 2019 -0500

    Increase the STLPoolAllocator window size to reduce destruction time.

commit b0ddaaae71a0a4959ad15c87579d85ed88e17e1f
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Fri Sep 13 11:52:51 2019 -0500

    Fixed a bug preventing parallel table loading.  works now.

commit b87039604e312c1ddb88cdb226228b1c3addf018
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Thu Sep 12 22:04:15 2019 -0500

    Checkpointing some experimental changes.

     - Made the allocator type used by PM joins the STLPoolAllocator
     - Changed the default chunk size used by STLPoolAlloc based on a few test
        runs
     - Made BPP-JL interleave the PM join data by join # to take advantage
        of new locking env on PM.
     - While I was at it, fixed MCOL-1758.

commit fd4b09cc383d2b96959a8e5ca490c940bacb3d37
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Thu Sep 12 16:03:30 2019 -0500

    Speculative change.  Row estimator was stopping at 20 extents.

    Removed that limitation.

commit 7dcdd5b5455f9ac06121dd3cf1ba722150f3ee56
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Thu Sep 5 09:10:28 2019 -0500

    Inlined some hot simpleallocator fcns.

commit 6d84daceecc5499f6286cf3468c118b8b1d28d8f
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Wed Sep 4 17:02:29 2019 -0500

    Some optimizations to PM hash table creation.

    - made locks more granular.
    - reduced logic per iteration when adding elements.

commit b20bf54ed97c5a0a88d414a4dd844a0afc2e27f3
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Wed Sep 4 15:32:32 2019 -0500

    Reduced granularity of djLock in PrimProc.

commit 6273a8f3c4c62b87ef91c77a829033426e38e4d4
Author: Patrick LeBlanc <patrick.leblanc@mariadb.com>
Date:   Wed Sep 4 14:45:58 2019 -0500

    Added a timer to PM hash table construction

    signal USR1 will print cumulative wall time to stdout & reset the timer.
2019-11-01 17:34:33 -04:00

416 lines
13 KiB
C++

/* Copyright (C) 2014 InfiniDB, Inc.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; version 2 of
the License.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
MA 02110-1301, USA. */
//
// $Id: batchprimitiveprocessor.h 2132 2013-07-17 20:06:10Z pleblanc $
// C++ Interface: batchprimitiveprocessor
//
// Description:
//
//
// Author: Patrick LeBlanc <pleblanc@calpont.com>, (C) 2008
//
// Copyright: See COPYING file that comes with this distribution
//
//
#ifndef BATCHPRIMITIVEPROCESSOR_H_
#define BATCHPRIMITIVEPROCESSOR_H_
#include <boost/scoped_array.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/scoped_ptr.hpp>
#ifndef _MSC_VER
#include <tr1/unordered_map>
#else
#include <unordered_map>
#endif
#include <boost/thread.hpp>
#include "errorcodes.h"
#include "serializeable.h"
#include "messagequeue.h"
#include "primitiveprocessor.h"
#include "command.h"
#include "umsocketselector.h"
#include "tuplejoiner.h"
#include "rowgroup.h"
#include "rowaggregation.h"
#include "funcexpwrapper.h"
#include "bppsendthread.h"
namespace primitiveprocessor
{
typedef std::tr1::unordered_map<int64_t, BRM::VSSData> VSSCache;
};
#include "primitiveserver.h"
namespace primitiveprocessor
{
typedef boost::shared_ptr<BatchPrimitiveProcessor> SBPP;
class scalar_exception : public std::exception
{
const char* what() const throw()
{
return "Not a scalar subquery.";
}
};
class NeedToRestartJob : public std::runtime_error
{
public:
NeedToRestartJob() : std::runtime_error("NeedToRestartJob") { }
NeedToRestartJob(const std::string& s) :
std::runtime_error(s) { }
};
class BatchPrimitiveProcessor
{
public:
BatchPrimitiveProcessor(messageqcpp::ByteStream&, double prefetchThresh,
boost::shared_ptr<BPPSendThread>, uint processorThreads);
~BatchPrimitiveProcessor();
/* Interface used by primproc */
void initBPP(messageqcpp::ByteStream&);
void resetBPP(messageqcpp::ByteStream&, const SP_UM_MUTEX& wLock, const SP_UM_IOSOCK& outputSock);
void addToJoiner(messageqcpp::ByteStream&);
int endOfJoiner();
void doneSendingJoinerData();
int operator()();
void setLBIDForScan(uint64_t rid);
/* Duplicate() returns a deep copy of this object as it was init'd by initBPP.
It's thread-safe wrt resetBPP. */
SBPP duplicate();
/* These need to be updated */
//bool operator==(const BatchPrimitiveProcessor&) const;
//inline bool operator!=(const BatchPrimitiveProcessor& bpp) const
//{
// return !(*this == bpp);
//}
inline uint32_t getSessionID()
{
return sessionID;
}
inline uint32_t getStepID()
{
return stepID;
}
inline uint32_t getUniqueID()
{
return uniqueID;
}
inline bool busy()
{
return fBusy;
}
inline void busy(bool b)
{
fBusy = b;
}
uint16_t FilterCount() const
{
return filterCount;
}
uint16_t ProjectCount() const
{
return projectCount;
}
uint32_t PhysIOCount() const
{
return physIO;
}
uint32_t CachedIOCount() const
{
return cachedIO;
}
uint32_t BlocksTouchedCount() const
{
return touchedBlocks;
}
void setError(const std::string& error, logging::ErrorCodeValues errorCode) {}
// these two functions are used by BPPV to create BPP instances
// on demand. TRY not to use unlock() for anything else.
void unlock()
{
pthread_mutex_unlock(&objLock);
}
bool hasJoin()
{
return doJoin;
}
private:
BatchPrimitiveProcessor();
BatchPrimitiveProcessor(const BatchPrimitiveProcessor&);
BatchPrimitiveProcessor& operator=(const BatchPrimitiveProcessor&);
void initProcessor();
#ifdef PRIMPROC_STOPWATCH
void execute(logging::StopWatch* stopwatch);
#else
void execute();
#endif
void writeProjectionPreamble();
void makeResponse();
void sendResponse();
/* Used by scan operations to increment the LBIDs in successive steps */
void nextLBID();
/* these send relative rids, should this be abs rids? */
void serializeElementTypes();
void serializeStrings();
void asyncLoadProjectColumns();
void writeErrorMsg(const std::string& error, uint16_t errCode, bool logIt = true, bool critical = true);
BPSOutputType ot;
BRM::QueryContext versionInfo;
uint32_t txnID;
uint32_t sessionID;
uint32_t stepID;
uint32_t uniqueID;
// # of times to loop over the command arrays
// ... This is 1, except when the first command is a scan, in which case
// this single BPP object produces count responses.
uint16_t count;
uint64_t baseRid; // first rid of the logical block
uint16_t relRids[LOGICAL_BLOCK_RIDS];
int64_t values[LOGICAL_BLOCK_RIDS];
boost::scoped_array<uint64_t> absRids;
boost::scoped_array<std::string> strValues;
uint16_t ridCount;
bool needStrValues;
/* Common space for primitive data */
static const uint32_t BUFFER_SIZE = 65536;
uint8_t blockData[BLOCK_SIZE * 8];
boost::scoped_array<uint8_t> outputMsg;
uint32_t outMsgSize;
std::vector<SCommand> filterSteps;
std::vector<SCommand> projectSteps;
//@bug 1136
uint16_t filterCount;
uint16_t projectCount;
bool sendRidsAtDelivery;
uint8_t ridMap;
bool gotAbsRids;
bool gotValues;
bool hasScan;
bool validCPData;
int64_t minVal, maxVal; // CP data from a scanned column
uint64_t lbidForCP;
// IO counters
boost::mutex counterLock;
uint32_t busyLoaderCount;
uint32_t physIO, cachedIO, touchedBlocks;
SP_UM_IOSOCK sock;
messageqcpp::SBS serialized;
SP_UM_MUTEX writelock;
// MCOL-744 using pthread mutex instead of Boost mutex because
// in it is possible that this lock could be unlocked when it is
// already unlocked. In Ubuntu 16.04's Boost this triggers a
// crash. Whilst it is very hard to hit this it is still bad.
// Longer term TODO: fix/remove objLock and/or refactor BPP
pthread_mutex_t objLock;
bool LBIDTrace;
bool fBusy;
/* Join support TODO: Make join ops a seperate Command class. */
boost::shared_ptr<joiner::Joiner> joiner;
std::vector<joblist::ElementType> smallSideMatches;
bool doJoin;
uint32_t joinerSize;
uint16_t preJoinRidCount;
boost::scoped_array<boost::scoped_array<boost::mutex> > addToJoinerLocks;
boost::scoped_array<boost::mutex> smallSideDataLocks;
void executeJoin();
// uint32_t ridsIn, ridsOut;
//@bug 1051 FilterStep on PM
bool hasFilterStep;
bool filtOnString;
boost::scoped_array<uint16_t> fFiltCmdRids[2];
boost::scoped_array<int64_t> fFiltCmdValues[2];
boost::scoped_array<std::string> fFiltStrValues[2];
uint64_t fFiltRidCount[2];
// query density threshold for prefetch & async loading
double prefetchThreshold;
/* RowGroup support */
rowgroup::RowGroup outputRG;
boost::scoped_ptr<rowgroup::RGData> outRowGroupData;
boost::shared_array<int> rgMap; // maps input cols to output cols
boost::shared_array<int> projectionMap; // maps the projection steps to the output RG
bool hasRowGroup;
/* Rowgroups + join */
typedef std::tr1::unordered_multimap<uint64_t, uint32_t,
joiner::TupleJoiner::hasher, std::equal_to<uint64_t>,
utils::STLPoolAllocator<std::pair<const uint64_t, uint32_t> > > TJoiner;
typedef std::tr1::unordered_multimap<joiner::TypelessData,
uint32_t, joiner::TupleJoiner::hasher, std::equal_to<joiner::TypelessData>,
utils::STLPoolAllocator<std::pair<const joiner::TypelessData, uint32_t> > > TLJoiner;
bool generateJoinedRowGroup(rowgroup::Row& baseRow, const uint32_t depth = 0);
/* generateJoinedRowGroup helper fcns & vars */
void initGJRG(); // called once after joining
void resetGJRG(); // called after every rowgroup returned by generateJRG
boost::scoped_array<uint32_t> gjrgPlaceHolders;
uint32_t gjrgRowNumber;
bool gjrgFull;
rowgroup::Row largeRow, joinedRow, baseJRow;
boost::scoped_array<uint8_t> baseJRowMem;
boost::scoped_ptr<rowgroup::RGData> joinedRGMem;
boost::scoped_array<rowgroup::Row> smallRows;
boost::shared_array<boost::shared_array<int> > gjrgMappings;
boost::shared_array<boost::shared_array<boost::shared_ptr<TJoiner> > > tJoiners;
typedef std::vector<uint32_t> MatchedData[LOGICAL_BLOCK_RIDS];
boost::shared_array<MatchedData> tSmallSideMatches;
void executeTupleJoin();
bool getTupleJoinRowGroupData;
std::vector<rowgroup::RowGroup> smallSideRGs;
rowgroup::RowGroup largeSideRG;
boost::shared_array<rowgroup::RGData> smallSideRowData;
boost::shared_array<rowgroup::RGData> smallNullRowData;
boost::shared_array<rowgroup::Row::Pointer> smallNullPointers;
boost::shared_array<uint64_t> ssrdPos; // this keeps track of position when building smallSideRowData
boost::shared_array<uint32_t> smallSideRowLengths;
boost::shared_array<joblist::JoinType> joinTypes;
uint32_t joinerCount;
boost::shared_array<uint32_t> tJoinerSizes;
// LSKC[i] = the column in outputRG joiner i uses as its key column
boost::shared_array<uint32_t> largeSideKeyColumns;
// KCPP[i] = true means a joiner uses projection step i as a key column
boost::shared_array<bool> keyColumnProj;
rowgroup::Row oldRow, newRow; // used by executeTupleJoin()
boost::shared_array<uint64_t> joinNullValues;
boost::shared_array<bool> doMatchNulls;
boost::scoped_array<boost::scoped_ptr<funcexp::FuncExpWrapper> > joinFEFilters;
bool hasJoinFEFilters;
bool hasSmallOuterJoin;
/* extra typeless join vars & fcns*/
boost::shared_array<bool> typelessJoin;
boost::shared_array<std::vector<uint32_t> > tlLargeSideKeyColumns;
boost::shared_array<boost::shared_array<boost::shared_ptr<TLJoiner> > > tlJoiners;
boost::shared_array<uint32_t> tlKeyLengths;
inline void getJoinResults(const rowgroup::Row& r, uint32_t jIndex, std::vector<uint32_t>& v);
// these allocators hold the memory for the keys stored in tlJoiners
boost::shared_array<utils::PoolAllocator> storedKeyAllocators;
// these allocators hold the memory for the large side keys which are short-lived
boost::scoped_array<utils::FixedAllocator> tmpKeyAllocators;
/* PM Aggregation */
rowgroup::RowGroup joinedRG; // if there's a join, the rows are formatted with this
rowgroup::SP_ROWAGG_PM_t fAggregator;
rowgroup::RowGroup fAggregateRG;
rowgroup::RGData fAggRowGroupData;
//boost::scoped_array<uint8_t> fAggRowGroupData;
/* OR hacks */
uint8_t bop; // BOP_AND or BOP_OR
bool hasPassThru;
uint8_t forHJ;
boost::scoped_ptr<funcexp::FuncExpWrapper> fe1, fe2;
rowgroup::RowGroup fe1Input, fe2Output, *fe2Input;
// note, joinFERG is only for metadata, and is shared between BPPs
boost::shared_ptr<rowgroup::RowGroup> joinFERG;
boost::scoped_array<uint8_t> joinFERowData;
boost::scoped_ptr<rowgroup::RGData> fe1Data, fe2Data; // can probably make these RGDatas not pointers to RGDatas
boost::shared_array<int> projectForFE1;
boost::shared_array<int> fe1ToProjection, fe2Mapping; // RG mappings
boost::scoped_array<boost::shared_array<int> > joinFEMappings;
rowgroup::Row fe1In, fe1Out, fe2In, fe2Out, joinFERow;
bool hasDictStep;
primitives::PrimitiveProcessor pp;
/* VSS cache members */
VSSCache vssCache;
void buildVSSCache(uint32_t loopCount);
/* To support limited DEC queues on the PM */
boost::shared_ptr<BPPSendThread> sendThread;
bool newConnection; // to support the load balancing code in sendThread
/* To support reentrancy */
uint32_t currentBlockOffset;
boost::scoped_array<uint64_t> relLBID;
boost::scoped_array<bool> asyncLoaded;
/* To support a smaller memory footprint when idle */
static const uint64_t maxIdleBufferSize = 16 * 1024 * 1024; // arbitrary
void allocLargeBuffers();
void freeLargeBuffers();
/* To ensure all packets of an LBID go out the same socket */
int sockIndex;
/* Shared nothing vars */
uint32_t dbRoot;
bool endOfJoinerRan;
/* Some addJoiner() profiling stuff */
boost::posix_time::ptime firstCallTime;
utils::Hasher_r bucketPicker;
const uint32_t bpSeed = 0xf22df448; // an arbitrary random #
uint processorThreads;
uint ptMask;
bool firstInstance;
friend class Command;
friend class ColumnCommand;
friend class DictStep;
friend class PassThruCommand;
friend class RTSCommand;
friend class FilterCommand;
friend class ScaledFilterCmd;
friend class StrFilterCmd;
friend class PseudoCC;
};
}
#endif