This patch improves handling of NULLs in textual fields in ColumnStore.
Previously empty strings were considered NULLs and it could be a problem
if data scheme allows for empty strings. It was also one of major
reasons of behavior difference between ColumnStore and other engines in
MariaDB family.
Also, this patch fixes some other bugs and incorrect behavior, for
example, incorrect comparison for "column <= ''" which evaluates to
constant True for all purposes before this patch.
* Fix clang warnings
* Remove vim tab guides
* initialize variables
* 'strncpy' output truncated before terminating nul copying as many bytes from a string as its length
* Fix ISO C++17 does not allow 'register' storage class specifier for outdated bison
* chars are unsigned on ARM, having if (ival < 0) always false
* chars are unsigned by default on ARM and comparison with -1 if always true
wide decimal column in a non-ColumnStore table throws an exception.
ROW::getSignedNullValue() method does not support wide decimal fields
yet. To fix this exception, we remove the call to this method from
CrossEngineStep::setField().
mcsconfig.h and my_config.h have the following
pre-processor definitions:
1. Conflicting definitions coming from the standard cmake definitions:
- PACKAGE
- PACKAGE_BUGREPORT
- PACKAGE_NAME
- PACKAGE_STRING
- PACKAGE_TARNAME
- PACKAGE_VERSION
- VERSION
2. Conflicting definitions of other kinds:
- HAVE_STRTOLL - this is a dirt in MariaDB headers.
Should be fixed in the server code. my_config.h erroneously
performs "#define HAVE_STRTOLL" instead of "#define HAVE_STRTOLL 1".
in some cases. The former is not CMake compatible style. The latter is.
3. Non-conflicting definitions:
Otherwise, mcsconfig.h and my_config.h should be mutually compatible,
because both are generated by cmake on the same host machine. So
they should have exactly equal definitions like "HAVE_XXX", "SIZEOF_XXX", etc.
Observations:
- It's OK to include both mcsconfig.h and my_config.h providing that we
suppress duplicate definition of the above conflicting types #1 and #2.
- There is no a need to suppress duplicate definitions mentioned in #3,
as they are compatible!
- my_sys.h and m_ctype.h must always follow a CMake configuation header,
either my_config.h or mcsconfig.h (or both).
They must never be included without any preceeding configuration header.
This change make sure that we resolve conflicts by:
- either disallowing inclusion of mcsconfig.h and my_config.h
at the same time
- or by hiding conflicting definitions #1 and #2
(with their later restoring).
- also, by making sure that my_sys.h and m_ctype.h always follow
a CMake configuration file.
Details:
- idb_mysql.h can now only be included only after my_config.h
An attempt to use idb_mysql.h with mcsconfig.h instead of
my_config.h is caught by the "#error" preprocessor directive.
- mariadb_my_sys.h can now be only included after mcsconfig.h.
An attempt to use mariadb_my_sys.h without mcscofig.h
(e.g. with my_config.h) is also caught by "#error".
- collation.h now can now be included in two ways.
It now has the following effective structure:
#if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
// Remember current conflicting definitions on the preprocessor stack
// Undefine current conflicting definitions
#endif
#include "mcsconfig.h"
#include "m_ctype.h"
#if defined(PREFER_MY_CONFIG_H) && defined(MY_CONFIG_H)
# Restore conflicting definitions from the preprocessor stack
#endif
and can be included as follows:
a. using only mcsconfig.h as a configuration header:
// my_config.h must not be included so far
#include "collation.h"
b. using my_config.h as the first included configuration file:
#define PREFER_MY_CONFIG_H // Force conflict resolution
#include "my_config.h" // can be included directly or indirectly
...
#include "collation.h"
Other changes:
- Adding helper header files
utils/common/mcsconfig_conflicting_defs_remember.h
utils/common/mcsconfig_conflicting_defs_restore.h
utils/common/mcsconfig_conflicting_defs_undef.h
to perform conflict resolution easier.
- Removing `#include "collation.h"` from a number of files,
as it's automatically included from rowgroup.h.
- Removing redundant `#include "utils_utf8.h"`.
This change is not directly related to the problem being fixed,
but it's nice to remove redundant directives for both collation.h
and utils_utf8.h from all the files that do not really need them.
(this change could probably have gone as a separate commit)
- Changing my_init() to MY_INIT(argv[0]) in the MCS services sources.
After the fix of the complitation failure it appeared that ColumnStore
services compiled with the debug build crash due to recent changes in
safemalloc. The crash happened in strcmp() with `my_progname` as an argument
(where my_progname is a mysys global variable). This problem should
probably be fixed on the server side as well to avoid passing NULL.
But, the majority of MariaDB executable programs also use MY_INIT(argv[0])
rather than my_init(). So let's make MCS do like the other programs do.
If an InnoDB table has spaces in the column names the cross engine step
would fail. This patch wraps quotes around the table and column names to
support this use case.
If a CrossEngine step has multiple filters the filters were overwriting
each other. This fix stores the filters as a vector and processes them
in a loop.
With 1.1 we have removed libdrizzle and used MariaDB's client library
instead for both CrossEngine and QueryStats. Unfortunately MariaDB 10.2
has two client libraries which have different structs with the same
name. When QueryStats was running inside the ColumnStore plugin this
symbol conflict was causing a crash.
The server's built-in client API has several different and several
missing functions so some additions to sm.cpp were made to fill the
gaps.
This patch does the following:
* Make sure that libmariadb is only linked to executables, not the
ColumnStore Plugin (to avoid symbol conflicts). Note that all
executables that link to CrossEngine and/or QueryStats need to link to
libmariadb to avoid missing symbol issues.
* Use the server's built-in client API for QueryStats when run in the
plugin
* Replace missing server built-in client API calls in sm.cpp (this is
for QueryStats and CrossEngine to keep the dynamic linker happy)
* Fixes issue where using 'localhost' as the MariaDB Server hostname
would fail in QueryStats.
When subqueries and group by are used in CrossEngine the first row group
is either corrupted or ignored. This is related to MCOL-430 which fixed
the case for FE1 mode.
We would get strange values for scale/precision in the results column of
a cross engine join causing bad results. This patch uses the values from
the MariaDB client connector instead.
* TEXT and BLOB now have separate identifiers internally
* TEXT columns are identified as such in system catalog
* cpimport only requires hex input for BLOB, not TEXT
This patch adds enough support so that cross engines joins with blob
columns in the foreign engines will work. The modifications are as
follows:
* Add CrossEngine support for non-NULL-terminated (binary) data
* Add row data support for blobs (similar to varbinary)
* Add engine support for writing out blob data correctly to the storage
engine API
* Re-enable blob support in the engine plugin
This does the following:
* Switch resource manager to a singleton which reduces the amount of
times the XML data is scanned and objects allocated.
* Make the I_S tables use the FE implementation of the system catalog
* Make the I_S.columnstore_columns table use the RID list cache
* Make the extentmap pre-allocate a vector instead of many small allocs
Cross engine was using fRowGroupDelivered for pulling the rows out of
libdrizzle and passing them on to the next step in the job list
simultaneously on different threads. Sometimes this is OK, but with
larger data sets it leads to data corruption and race conditions.
For pulling the rows out of libdrizzle this patch uses a new RowGroup
object instead to avoid the collision.
In addition this patch makes the DrizzleMySQL class a dynamically
allocated object. A first step into potentially using unbuffered row
results for performance and lower RAM usage at a later date.