This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.
Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
* toCppCode for ParseTree and TreeNode
* generated tree is compiling
* Put tree constructors into tests
* Minor fixes
* Fixed parse + some constructors
* Fixed includes, removed debug and old data
* Hopefully fix clang errors
* Forgot an override
* More overrides
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.
The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.
The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.
Part 1:
As part of MCOL-3776 to address synchronization issue while accessing
the fTimeZone member of the Func class, mutex locks were added to the
accessor and mutator methods. However, this slows down processing
of TIMESTAMP columns in PrimProc significantly as all threads across
all concurrently running queries would serialize on the mutex. This
is because PrimProc only has a single global object for the functor
class (class derived from Func in utils/funcexp/functor.h) for a given
function name. To fix this problem:
(1) We remove the fTimeZone as a member of the Func derived classes
(hence removing the mutexes) and instead use the fOperationType
member of the FunctionColumn class to propagate the timezone values
down to the individual functor processing functions such as
FunctionColumn::getStrVal(), FunctionColumn::getIntVal(), etc.
(2) To achieve (1), a timezone member is added to the
execplan::CalpontSystemCatalog::ColType class.
Part 2:
Several functors in the Funcexp code call dataconvert::gmtSecToMySQLTime()
and dataconvert::mySQLTimeToGmtSec() functions for conversion between seconds
since unix epoch and broken-down representation. These functions in turn call
the C library function localtime_r() which currently has a known bug of holding
a global lock via a call to __tz_convert. This significantly reduces performance
in multi-threaded applications where multiple threads concurrently call
localtime_r(). More details on the bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=16145
This bug in localtime_r() caused processing of the Functors in PrimProc to
slowdown significantly since a query execution causes Functors code to be
processed in a multi-threaded manner.
As a fix, we remove the calls to localtime_r() from gmtSecToMySQLTime()
and mySQLTimeToGmtSec() by performing the timezone-to-offset conversion
(done in dataconvert::timeZoneToOffset()) during the execution plan
creation in the plugin. Note that localtime_r() is only called when the
time_zone system variable is set to "SYSTEM".
This fix also required changing the timezone type from a std::string to
a long across the system.
This patch is fixing the following bugs:
- MCOL-4609 TreeNode::getIntVal() does not round: implicit DECIMAL->INT cast is not MariaDB compatible
- MCOL-4610 TreeNode::getUintVal() looses precision for narrow decimal
- MCOL-4619 TreeNode::getUintVal() does not round: Implicit DECIMAL->UINT conversion is not like in InnoDB
- MCOL-4650 TreeNode::getIntVal() looses precision for narrow decimal
- MCOL-4651 SEC_TO_TIME(hugePositiveDecimal) returns a negative time
CI with RelWithDebInfo builds revealed a problem in the main
patch for MCOL-4464, which did not show up with Debug builds.
Methods like:
- getDoubleVal()
- getDateIntVal()
- getDatetimeIntVal()
- getTimestampIntVal()
- getTimeIntVal()
- getUintVal()
- getIntVal()
- getStrVal()
require the caller to initialize the isNull argument to false.
This fact was not taken into account in MCOL-4464.
Adding proper initializations.
1. Add wide decimal support to AggregateColumn::evaluate
and TreeNode::getDecimalVal().
2. Use the pm aggregate attributes to determine um aggregate
attributes in TupleAggregateStep::prep2PhasesAggregate.
MCOL-4409 This patch combines VDecimal and Decimal and makes
IDB_Decimal an alias for the result class
MCOL-4409 More boilerplate reduction in Func_mod
Removed couple TSInt128::toType() methods
In addition, a regression in a WHERE clause with a WF field
as the LHS and an addition operation on two WF fields on the RHS
is also fixed. The issue was SimpleColumn::getDecimalVal() was
setting precision = 19, with the value of one of the operands of the
addition operation being set in VDecimal::value instead of
VDecimal::s128Value. addSubtractExecute() in mcs_decimal.cpp makes the
assumption that if precision > 18 and precision <= 38, we need to
fetch the wide s128Value, not the narrow value field. So we are
fixing the precision set in SimpleColumn::getDecimalVal().
Removed uint128 from joblist/lbidlist.*
Another toString() method for wide-decimal that is EMPTY/NULL aware
Unified decimal processing in WF functions
Fixed a potential issue in EqualCompData::operator() for
wide-decimal processing
Fixed some signedness warnings
2. Set Decimal precision in SimpleColumn::evaluate().
3. Add support for int128_t in ConstantColumn.
4. Set IDB_Decimal::s128Value in buildDecimalColumn().
5. Use width 16 as first if predicate for branching based on decimal width.
Fixes:
* Irrelevant where conditions
* Irrelevant const
* A potential infinite loop in treenode
* Bad implicit case fallthroughs
* Explicit markings for required case fallthroughs
* Unused variables
* Unused function
Also disabled some warnings for now which we should fix later.