pDictionaryScan won't work for BLOB/TEXT since it requires searching the
data file and rebuilding the token from matches. The tokens can't be
rebuild correctly due the bits in the token used for block counts. This
patch forces the use of pDictionaryStep instead for WHERE conditions.
In addition this patch adds support for TEXT/BLOB in various parts of
the job step processing. This fixes things like error 202 during an
UPDATE with a join condition on TEXT/BLOB columns.
For the initial BLOB/TEXT pull request we put them in the same bucket as
VARBINARY, forcing many functions to be disabled. This patch enables the
same TEXT function support as VARCHAR.
DBRM connections are reused so that we don't have a huge amount of
TIME_WAIT sockets when there are large amounts of DML. Also applied to
i_s.columnstore_files
DBRM connections are reused so that we don't have a huge amount of
TIME_WAIT sockets when there are large amounts of DML. Also applied to
i_s.columnstore_files
* TEXT and BLOB now have separate identifiers internally
* TEXT columns are identified as such in system catalog
* cpimport only requires hex input for BLOB, not TEXT
This patch adds enough support so that cross engines joins with blob
columns in the foreign engines will work. The modifications are as
follows:
* Add CrossEngine support for non-NULL-terminated (binary) data
* Add row data support for blobs (similar to varbinary)
* Add engine support for writing out blob data correctly to the storage
engine API
* Re-enable blob support in the engine plugin
This patch strips out our old version of Snappy and uses the OS version
instead. All our supported OSes have the latest version of Snappy in
their base repositories.
This does the following:
* Switch resource manager to a singleton which reduces the amount of
times the XML data is scanned and objects allocated.
* Make the I_S tables use the FE implementation of the system catalog
* Make the I_S.columnstore_columns table use the RID list cache
* Make the extentmap pre-allocate a vector instead of many small allocs
This fix improves the performance of ExeMgr by doing the following:
* Significantly reduces the amount of time the xml configuration is
scanned
* Uses a much faster way to determine the CPU core count
* Reduces the amount of times certain allocations are executed
* Rowgroup pre-allocates vectors for 1024 rows
This improves performance for the first query of a connection and the
performance for smaller result sets. It may well improve performance in
other areas too.
We already use the OS library with our headers, we are just lucky this
has worked so far. This patch removes the source and switches the the OS
headers instead.
If there is only one packet in the buffer it is possible that the read
doesn't contain the whole packet, the resulting in the buffer pointer
going less than 0 and bad things happening.
Fixes the following:
* Compression ratio calculation was incorrect
* Possible issues due to system catalog thread ID usage
* Compressed file size data count was leaking many FDs when the table
wasn't compressed
* Compressed file size data count was allocating random large amounts
of RAM and then leaking it when the table wasn't compressed
* Add INFORMATION_SCHEMA.COLUMNSTORE_FILES which contains information
about files
* Remove file information from COLUMNSTORE_EXTENTS (due to above)
* Hide columns with Object ID < 3000 (internal columns)
* Fix bad calculation in data_size columns
* Fix minor memory leak
* Add compressedSize() function to IDBFileSystem to get the used file
size for a compressed file
* Add columnstore_info schema with utility stored procedures to access
the information_schema tables
It is possible to have a VARCHAR column that isn't NUL terminated, an
example of this is a union of two CHAR columns. So the length should
always act as a terminator when there is no NUL.