This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.
Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
* MCOL-4560 remove unused xml entries and code that references it.
There is reader code and variables for some of these settings, but nobody uses them.
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.
The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.
The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.
data types TEXT, CHAR, VARCHAR, FLOAT and DOUBLE are not yet supported by vectorized path
This patch introduces an example for Google benchmarking suite to measure a perf diff
b/w legacy scan/filtering code and the templated version
SCommand StrFilterCmd::duplicate() missed these two lines:
filterCmd->leftColType = leftColType;
filterCmd->rightColType = rightColType;
which exist in the parent's FilterCommand::duplicate().
Rewriting the code to avoid duplication by using more inherited
methods/constructors. This reduces the probability of similar bugs
in the future.
* TEXT and BLOB now have separate identifiers internally
* TEXT columns are identified as such in system catalog
* cpimport only requires hex input for BLOB, not TEXT
* DML writes for multi-block dictionary (blob) now works
* PrimProc fixed so that the first block in multi-block is read
correctly
* Performance optimisation (removed string copy into stack) for new
dictionary entries