* MCOL-5092 Ensure column width is correct for datatype
Change MODA return type to STRING
Modify MODA to handle every numeric type
* MCOL-5162 MODA to support char and varchar with collation support
Fixes to the aggregate bit functions
When we fixed the storage sign issue for MCOL-5092, it uncovered a problem in the bit aggregates (bit_and, bit_or and bit_xor). These aggregates should always return UBIGINT, but they relied on the type of the argument column, which gave bad results.
This patch adds support for on clause filter for a table which is not involved in particular join
by disabling an `merge optimization` for those particular cases.
The `merge optimization` is optimization when CS
tries to create a one BPP join with one `large side` table and multiple `small sides` tables, in this
case we cannot apply a FE filter if this filter requires a columns from `small side` table which is not
involved in particular join.
* Restructured test suites and added autopilot and extended suites
* Updated autopilot with correct branch - develop
* Moved setup test case to a 'setup' directory, for consistency
* Fixed a path issue
* Updated some tests cases to keep up with development
Co-authored-by: root <root@rocky8.localdomain>
This patch fixes a wrong `join id` assignment for `TupleHashJoinStep` in a view.
After MCOL-334 CS assigns a '-1' as `join id` for `TupleHashJoinStep` in a view, and
in this case we cannot apply a filter for specific `Join step`, which is associated with `join id`
for 2 reasons:
1. Filters for all `TupleHashJoinSteps` associated with the same `join id`, which is '-1'.
2. When CS creates a `joinIdIndexMap` it eliminates all `join ids` which a less or equal 0.
This patch also fixes some tests for the view, which were generated wrong results.
* MCOL-5074 CASE with IN and aggregate asserts
gwip-scsp wasn't set and buildPredicateItem() was called which assumes it is set. Added code to set properly in this case
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.
The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.
The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.
Part 1:
As part of MCOL-3776 to address synchronization issue while accessing
the fTimeZone member of the Func class, mutex locks were added to the
accessor and mutator methods. However, this slows down processing
of TIMESTAMP columns in PrimProc significantly as all threads across
all concurrently running queries would serialize on the mutex. This
is because PrimProc only has a single global object for the functor
class (class derived from Func in utils/funcexp/functor.h) for a given
function name. To fix this problem:
(1) We remove the fTimeZone as a member of the Func derived classes
(hence removing the mutexes) and instead use the fOperationType
member of the FunctionColumn class to propagate the timezone values
down to the individual functor processing functions such as
FunctionColumn::getStrVal(), FunctionColumn::getIntVal(), etc.
(2) To achieve (1), a timezone member is added to the
execplan::CalpontSystemCatalog::ColType class.
Part 2:
Several functors in the Funcexp code call dataconvert::gmtSecToMySQLTime()
and dataconvert::mySQLTimeToGmtSec() functions for conversion between seconds
since unix epoch and broken-down representation. These functions in turn call
the C library function localtime_r() which currently has a known bug of holding
a global lock via a call to __tz_convert. This significantly reduces performance
in multi-threaded applications where multiple threads concurrently call
localtime_r(). More details on the bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=16145
This bug in localtime_r() caused processing of the Functors in PrimProc to
slowdown significantly since a query execution causes Functors code to be
processed in a multi-threaded manner.
As a fix, we remove the calls to localtime_r() from gmtSecToMySQLTime()
and mySQLTimeToGmtSec() by performing the timezone-to-offset conversion
(done in dataconvert::timeZoneToOffset()) during the execution plan
creation in the plugin. Note that localtime_r() is only called when the
time_zone system variable is set to "SYSTEM".
This fix also required changing the timezone type from a std::string to
a long across the system.
on a non-ColumnStore table does not work.
As part of MCOL-4617, we moved the in-to-exists predicate creation
and injection from the server into the engine. However, when query
with an IN Subquery contains a non-ColumnStore table, the server
still performs the in-to-exists predicate transformation for the
foreign engine table. This caused ColumnStore's execution plan to
contain incorrect WHERE predicates. As a fix, we call
mutate_optimizer_flags() for the WRITE lock, in addition to the READ
table lock. And in mutate_optimizer_flags(), we change the optimizer
flag from OPTIMIZER_SWITCH_IN_TO_EXISTS to OPTIMIZER_SWITCH_MATERIALIZATION.
wide decimal column in a non-ColumnStore table throws an exception.
ROW::getSignedNullValue() method does not support wide decimal fields
yet. To fix this exception, we remove the call to this method from
CrossEngineStep::setField().
After an AggreateColumn corresponding to SUM(1+1) is created,
it is pushed to the list:
gwi.count_asterisk_list.push_back(ac)
Later, in getSelectPlan(), the expression SUM(1+1) was erroneously
treated as a constant:
if (!hasNonSupportItem && !nonConstFunc(ifp) && !(parseInfo & AF_BIT) && tmpVec.size() == 0)
{
srcp.reset(buildReturnedColumn(item, gwi, gwi.fatalParseError));
This code freed the original AggregateColumn and replaced to a ConstantColumn.
But gwi.count_asterisk_list still pointer to the freed AggregateColumn().
The expression SUM(1+1) was treated as a constant because tmpVec
was empty due to a bug in this code:
// special handling for count(*). This should not be treated as constant.
if (isp->argument_count() == 1 &&
( sfitempp[0]->type() == Item::CONST_ITEM &&
(sfitempp[0]->cmp_type() == INT_RESULT ||
sfitempp[0]->cmp_type() == STRING_RESULT ||
sfitempp[0]->cmp_type() == REAL_RESULT ||
sfitempp[0]->cmp_type() == DECIMAL_RESULT)
)
)
{
field_vec.push_back((Item_field*)item); //dummy
Notice, it handles only aggregate functions with explicit literals
passed as an argument, while it does not handle constant expressions
such as 1+1.
Fix:
- Adding new classes ConstantColumnNull, ConstantColumnString,
ConstantColumnNum, ConstantColumnUInt, ConstantColumnSInt,
ConstantColumnReal, ValStrStdString, to reuse the code easier.
- Moving a part of the code from the case branch handling CONST_ITEM
in buildReturnedColumn() into a new function
newConstantColumnNotNullUsingValNativeNoTz(). This
makes the code easier to read and to reuse in the future.
- Adding a new function newConstantColumnMaybeNullFromValStrNoTz().
Removing dulplicate code from !!!four!!! places, using the new
function instead.
- Adding a function isSupportedAggregateWithOneConstArg() to
properly catch all constant expressions. Using the new function parse_item()
in the code commented as "special handling for count(*)".
Now it pushes all constant expressions to field_vec, not only
explicit literals.
- Moving a part of the code from buildAggregateColumn()
to a helper function processAggregateColumnConstArg().
Using processAggregateColumnConstArg() in the CONST_ITEM
and NULL_ITEM branches.
- Adding a new branch in buildReturnedColumn() handling FUNC_ITEM.
If a function has constant arguments, a ConstantColumn() is
immediately created, without going to
buildArithmeticColumn()/buildFunctionColumn().
- Reusing isSupportedAggregateWithOneConstArg()
and processAggregateColumnConstArg() in buildAggregateColumn().
A new branch catches aggregate function has only one constant argument
and immediately creates a single ConstantColumn without
traversing to the argument sub-components.