* Adds CompressInterfaceLZ4 which uses LZ4 API for compress/uncompress.
* Adds CMake machinery to search LZ4 on running host.
* All methods which use static data and do not modify any internal data - become `static`,
so we can use them without creation of the specific object. This is possible, because
the header specification has not been modified. We still use 2 sections in header, first
one with file meta data, the second one with pointers for compressed chunks.
* Methods `compress`, `uncompress`, `maxCompressedSize`, `getUncompressedSize` - become
pure virtual, so we can override them for the other compression algos.
* Adds method `getChunkMagicNumber`, so we can verify chunk magic number
for each compression algo.
* Renames "s/IDBCompressInterface/CompressInterface/g" according to requirement.
Setting the definer didn't work and gave errors. This is because I
forgot to set the definer for the function too. This patch instead sets
the security level to INVOKER which is probably a better way of handling
this anyway.
A major upgrade (1.1 -> 1.2 for example) may have issues due to stale
FRM table IDs. This commit adds a stored procedure that changes the
table comment to empty on every ColumnStore table to repair the IDs.
The user should run this as part of the upgrade procedure between major
versions.
Add condition pushdowns to the information_schema tables to give a
performance improvement when a relevant WHERE condition is provided. In
addition there is a new table_usage() stored procedure designed to use
the pushdowns.
The columnstore_info procs could only be called from within
columnstore_info due to them assuming format_filesize() is local. This
patch calls format_filesize() with an explicit schema allowing it to
work correctly.
* Uncompressed columns caused a miscalculation for compression ratio
* We now show a ratio such as 2:1 rather than a percentage
* compressed_data_size instead of file_size is used to show the
compression of actual data, ignoring the pre-allocated segment
* COLUMNSTORE_EXTENTS data_size now shows the total size on the highest
segment of an extent file. It also shows 0 bytes if HWM is zero so that
if there is more than one segment it doesn't show 8192 bytes for the
lower segments.
* COMPRESSION_RATIO was missing some data and using compressed data size
instead of file size (which is probably more realistic).
The old procedure could be wildly incorrect when there were multiple
extents for a dictionary column.
The new one uses tables so that columnstore_files doesn't get
hammered too hard. They can't be temporary tables due to the reuse
restriction on temporary tables.
table_usage() is now called using:
* table_usage(NULL, NULL) - all tables
* table_usage(NULL, 'table') - match tables with the name 'table' in all
schemas
* table_usage('schema', 'table') - match a specific schema and table
combination
Fixes the following:
* Compression ratio calculation was incorrect
* Possible issues due to system catalog thread ID usage
* Compressed file size data count was leaking many FDs when the table
wasn't compressed
* Compressed file size data count was allocating random large amounts
of RAM and then leaking it when the table wasn't compressed
* Add INFORMATION_SCHEMA.COLUMNSTORE_FILES which contains information
about files
* Remove file information from COLUMNSTORE_EXTENTS (due to above)
* Hide columns with Object ID < 3000 (internal columns)
* Fix bad calculation in data_size columns
* Fix minor memory leak
* Add compressedSize() function to IDBFileSystem to get the used file
size for a compressed file
* Add columnstore_info schema with utility stored procedures to access
the information_schema tables