- Add a testcase showing JSON_HB histograms handle multi-byte characters
correctly.
- Make Item_func_json_unquote::val_str() handle situation where
it is reading non-UTF8 "JSON" and transcoding it into UTF-8.
(the JSON spec only allows UTF8 but MariaDB's implementation
supports non-UTF8 as well)
- Make Item_func_json_search::compare_json_value_wild() handle
json_unescape()'s return values in the same way its done in other
places.
- Coding style fixes.
When json_escape changed[1] to return a -1 in the case of
a character that didn't match the character set, json_unescape_to_string
assumed the -1 meant out of memory and just looped with more
memory.
Problem 1 - json_escape needs to return a different code
so that the different between charset incompatibility and out
of memory needs to occur. This enables json_escape_to_string
to handle the it correctly (ignore and fail seems the best
option).
Problem 2 - JSON histograms need to support character with
where the column json min/maximum value aren't a character
set represented by a single byte.
Problem 2 was previously hidden as ? was a result of the conversion.
As JSON histograms can relate to columns when have an explict
character set, use that and fall back to bin which was the
previous default for non-string columns.
Replaces -1/-2 constants and handling with JSON_ERROR_ILLEGAL_SYMBOL /
JSON_ERROR_OUT_OF_SPACE defines.
[1] regression from: f699010c0f
Histogram_json_hb::range_selectivity() may return small negative
numbers due to rounding errors in the histogram.
Make sure the returned value is non-negative.
Add an assert to catch negative values that are not small.
(attempt #2)
In Histogram_json_hb::point_selectivity(), do return selectivity of 0.0
when the histogram says so.
The logic of "Do not return 0.0 estimate as it causes a multiply-by-zero
meltdown in cost and cardinality calculations" is moved into
records_in_column_ranges() where it is one *once* per column pair (as
opposed to doing once per range, which can cause the error to add-up
to large number when there are many ranges)
Followup: remove this line from get_column_range_cardinality()
set_if_bigger(res, col_stats->get_avg_frequency());
and make sure it is only used with the binary histograms.
For JSON histograms, it makes the estimates unnecessarily imprecise.
Also report JSON histogram load errors into error log, like it is already
done with other histogram/statistics load errors.
Add test coverage to see what happens if one upgrades but does NOT run
mysql_upgrade.
Previous JSON parser was using an API which made the parsing
inefficient: the same JSON contents was parsed again and again.
Switch to using a lower-level parsing API which allows to do
parsing in an efficient way.
- Make Histogram_json_hb::range_selectivity handle singleton buckets
specially when computing selectivity of the max. endpoint bound.
(for min. endpoint, we already do that).
- Also, fixed comments for Histogram_json_hb::find_bucket
When loading the histogram, use table->field[N], not table->s->field[N].
When we used the latter we would corrupt the fields's default value. One
of the consequences of that would be that AUTO_INCREMENT fields would
stop working correctly.
The problem was introduced in fix for MDEV-26724. That patch has made it
possible for histogram collection to fail. In particular, it fails for
non-assigned characters.
When histogram construction fails, we also abort the computation of
COUNT(DISTINCT). When we try to use the value, we get valgrind failures.
Switched the code to abort the statistics collection in this case.
Part#3:
- make json_escape() return different errors on conversion error
and on out-of-space condition.
- Make histogram code handle conversion errors.
- Fix bad tests in statistics_json test: make them meaningful and make them
work on windows
- Fix analyze_debug.test: correctly handle errors during ANALYZE
* it also adds an "explain select" statement to the test so that the fprintf calls
can print the computed intervals to mysqld.1.err
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>
This fixes the memory allocation for json histogram builder and add more column types for testing.
Some challenges at the moment include:
* Garbage value at the end of JSON array still persists.
* Garbage value also gets appended to bucket values if the column is a primary key.
* There's a memory leak resulting in a "Warning: Memory not freed" message at the end of tests.
Signed-off-by: Michael Okoko <okokomichaels@outlook.com>