Adds a special column which helps to differentiate data and rollups of
various depts and a simple logic to row aggregation to add processing of
subtotals.
This patch:
1. Properly processes situation when pm join result count is exceeded.
2. Adds session variable 'columnstore_max_pm_join_result_count` to control the limit.
This is attempt to make some part of the code more stable.
For some reason we can get a spurious nullptr for boost::shared_ptr
which cause an assert and abort.
In the joblist code, in addition to sending the lbid of the SCAN
column, we also send the corresponding lbid of the AUX column to PrimProc.
In the primitives processor code in PrimProc, we load the AUX column
block (8192 rows since the AUX column is implemented as a 1-byte
UNSIGNED TINYINT) into memory and then pass it down to the low-level
scanning (vectorized scanning as applicable) routine to build a non-Empty
mask for the block being processed to filter out DELETED rows based on
comparison of the AUX block row to the empty magic value for the AUX column.
JobList low-level code relateod to primitive jobs now uses shared pointers instead of ByteStream refs talking to DEC
b/c same-node EM-PP communication now goes over a queue in DEC instead of a network hop.
PP now has a separate thread that processes the primitive job messages from that DEC queue.
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.
The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.
The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.
* Fix clang warnings
* Remove vim tab guides
* initialize variables
* 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length
* Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison
* chars are unsigned on ARM, having if (ival < 0) always false
* chars are unsigned by default on ARM and comparison with -1 if always true
This patch changes the logic of the `receiveMultiPrimitiveMessages`
function in the following way:
1. We have only one aggregation thread which reads the data from Queue (which is populated
by messages from BPPs).
2. Processing of the received `bytestream vector` could be in parallel depends on the
type of `TupleBPS` operation (join, fe2, ...) and actual thread pool workload.
The motivation is to eliminate some amount of context switches.
This patch:
1. Removes the option to declare uncompressed columns (set columnstore_compression_type = 0).
2. Ignores [COMMENT '[compression=0] option at table or column level (no error messages, just disregard).
3. Removes the option to set more than 2 extents per file (ExtentsPreSegmentFile).
4. Updates rebuildEM tool to support up to 10 dictionary extent per dictionary segment file.
5. Adds check for `DBRootStorageType` for rebuildEM tool.
6. Renamed rebuildEM to mcsRebuildEM.
After creating and populating tables with CHAR(5) case insensitive columns,
in a set of consequent joins like:
select * from t1, t2 where t1.c1=t2.c1;
select * from t1, t2 where t1.c1=t2.c2;
select * from t1, t2 where t1.c2=t2.c1;
select * from t1, t2 where t1.c2=t2.c2;
only the first join worked reliably case insensitively.
Removing the remaining pieces of the code that used order_swap() to compare
short CHAR columns, and using Charset::strnncollsp() instead.
This fixes the issue.
For now it consists of only:
using int128_t = __int128;
using uint128_t = unsigned __int128;
All new privitive data types should go into this file in the future.
This commit also adds support in TupleHashJoinStep::forwardCPData,
although we currently do not support wide decimals as join keys.
Row estimation to determine large-side of the join is also updated.
MCOL-894 Add default values in Compare and CSEP ctors to activate UTF-8 sorting
properly.
MCOL-894 Unit tests to build a framework for a new parallel sorting.
MCOL-894 Finished with parallel workers invocation.
The implementation lacks final aggregation step.
MCOL-894 TupleAnnexStep's init and destructor are now parallel execution aware.
Implemented final merging step for parallel execution finalizeParallelOrderBy().
Templated unit test to use it with arbitrary number of rows, threads.
Reuse LimitedOrderBy in the final step
MCOL-894 Cleaned up finalizeParallelOrderBy.
MCOL-894 Add and propagate thread variable that controls a number of threads.
Optimized comparators used for sorting and add corresponding UTs.
Refactored TupleAnnexStep::finalizeParallelOrderByDistinct.
Parallel sorting methods now preallocates memory in batches.
MCOL-894 Fixed comparator for StringCompare.
This does the following:
* Switch resource manager to a singleton which reduces the amount of
times the XML data is scanned and objects allocated.
* Make the I_S tables use the FE implementation of the system catalog
* Make the I_S.columnstore_columns table use the RID list cache
* Make the extentmap pre-allocate a vector instead of many small allocs