The problem was introduced in fix for MDEV-26724. That patch has made it
possible for histogram collection to fail. In particular, it fails for
non-assigned characters.
When histogram construction fails, we also abort the computation of
COUNT(DISTINCT). When we try to use the value, we get valgrind failures.
Switched the code to abort the statistics collection in this case.
Part#3:
- make json_escape() return different errors on conversion error
and on out-of-space condition.
- Make histogram code handle conversion errors.
Factor the code that updates count, count_distinct,
count_distinct_single_occurrence into class Basic_stats_collector
Change from Histogram_builder and its descendant Histogram_builder_json
to Histogram_builder (the interface), and Histogram_binary_builder,
Histogram_json_builder.
In Histogram_json_builder, do not forget to collect the right bound
of the right-most bucket.
* it also adds an "explain select" statement to the test so that the fprintf calls
can print the computed intervals to mysqld.1.err
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
This fixes the wrong calculation for avg_frequency in json histograms
by replacing the specific histogram objects with the generic Histogram_base class.
It also restores get/set size functions as they were useful in calculating fields
for binary histogram.
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
A demo of how to use in-memory data structure for histogram.
The patch shows how to
* convert string form of data to binary form
* compare two values in binary form
* compute a fraction for val in [X, Y] range.
grep for GSOC-TODO for notes.
Preparation for handling different kinds of histograms:
- In Column_statistics, change "Histogram histogram" into
"Histogram *histogram_". This allows for different kinds
of Histogram classes with virtual functions.
- [Almost] remove the usage of Histogram->set_values and
Histogram->set_size. The code outside the histogram should
not make any assumptions about what/how is stored in the Histogram.
- Introduce drafts of methods to read/save histograms to/from disk.
This fixes the memory allocation for json histogram builder and add more column types for testing.
Some challenges at the moment include:
* Garbage value at the end of JSON array still persists.
* Garbage value also gets appended to bucket values if the column is a primary key.
* There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests.
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.