EM scaleability project has two parts: phase1 and phase2.
This is phase1 that brings EM index to speed up(from O(n) down
to the speed of boost::unordered_map) EM lookups looking for
<dbroot, oid, partition> tuple to turn it into LBID,
e.g. most bulk insertion meta info operations.
The basis is boost::shared_managed_object where EMIndex is
stored. Whilst it is not debug-friendly it allows to put a
nested structs into shmem. EMIndex has 3 tiers. Top down description:
vector of dbroots, map of oids to partition vectors, partition
vectors that have EM indices.
Separate EM methods now queries index before they do EM run.
EMIndex has a separate shmem file with the fixed id
MCS-shm-00060001.
* MCOL-4560 remove unused xml entries and code that references it.
There is reader code and variables for some of these settings, but nobody uses them.
EM and PP are most resource-hungry runtimes.
The merge enables to control their cummulative
resource consumption, thread allocation + enables
zero-copy data exchange b/w local EM and PP facilities.
Short CHAR/VARCHAR column values contain integer-encoded strings.
After certain manipulations(orderSwap(strnxfrm(str))) the values
become integers that preserve original strings order relation
according to a certain translation rules(collation). Prepared
values are ready to be SIMD-processed.
respondWait could be set to false
while other threads were waiting. With respondWait false, okToRrespond
wouldn't ever get notify_one(). Get rid of respondWait and use
fProcessorPool->blockedThreadCount to determine if any threads may be
waiting.
The idea is relatively simple - encode prefixes of collated strings as
integers and use them to compute extents' ranges. Then we can eliminate
extents with strings.
The actual patch does have all the code there but miss one important
step: we do not keep collation index, we keep charset index. Because of
this, some of the tests in the bugfix suite fail and thus main
functionality is turned off.
The reason of this patch to be put into PR at all is that it contains
changes that made CHAR/VARCHAR columns unsigned. This change is needed in
vectorization work.