collations
Analysis:
When we have negative index, the value in array_counter[] array is going to
be -1 at some point ( because in case of negative index in json path, the
initial value for a path with negative index is -<size_of_array>, and as we
move forward in array while parsing it and finding path, this value
increments). Since SKIPPED_STEP_MARK, is maximum uint value, it gets
compared to some int value in the array and eventually equates to -1
and messes with path.
Fix:
Make SKIPPED_STEP_MARK maximum of INT32.
json_normalize_number(): Avoid accessing str past str_len.
The function would seem to work incorrectly when some digits are
not followed by a decimal point (.) or an exponent (E or e).
Analysis:
When we skip level when path is found, it changes the state of the json
engine. This breaks the sequence for json_get_path_next() which is called at
the end to ensure json document is valid and leads to crash.
Fix:
Use json_scan_next() at the end to check if json document has correct
syntax (is valid).
Analysis:
Parsing json path happens only once. When paring, we set types of path
(types_used) to use later. If the path type has range or wild card, only
then multiple values get added to the result set.
However for each row in the table, types_used still gets
overwritten to default (no multiple values) and is also not set again
(because path is already parsed). Since multiple values depend on the
type of path, they dont get added to result set either.
Fix:
set default for types_used only if path is not parsed already.
procedures
MDEV-22224 caused the parsing of keys with hyphens to break by setting
the state transitions for parsing keys to JE_SYN (syntax error) when
they encounter a hyphen. However json key names may contain hyphens and
still be considered valid json.
This patch changes the state transition table so that key names with
hyphens remain valid. Note that unquoted key names in paths like
$.key-name are also valid again. This restores the previous behaviour
when hyphens were considered part of the P_ETC character class.
Analysis: The JSON functions(JSON_ARRAY[OBJECT|ARRAY_APPEND|ARRAY_INSERT|INSERT|SET|REPLACE]) result is truncated when the function is called based on LONGTEXT field. The overflow occurs when computing the result length due to the LONGTEXT max length is same as uint32 max length. It lead to wrong result length.
Fix: Add static_cast<ulonglong> to avoid uint32 overflow and fix the arguments used.
Analysis: JSON_VALUE() returns "null" string instead of NULL pointer.
Fix: When the type is JSON_VALUE_NULL (which is also a scalar) set
null_value to true and return 0 instead of returning string.
Analysis: JSON_OVERLAPS() does not check nested key-value pair completely.
If there is nested object, then it only scans and validates if two json values
overlap until one of the value (which is of type object) is exhausted.
This does not really check if the two values of keys are exacly the same, instead
it only checks if key-value pair of one is present in key-value pair of the
other
Fix: Normalize the values (which are of type object) and compare
using string compare. This will validate if two values
are exactly the same.
Analysis: When trying to compare json paths, the array_sizes variable is
NULL when beginning. But trying to access address by adding to the NULL
pointer while recursive calling json_path_parts_compare() for handling
double wildcard, it causes undefined behaviour and the array_sizes
variable eventually becomes non-null (has some address).
This eventually results in crash.
Fix: If array_sizes variable is NULL then pass NULL recursively as well.
path (when range is used)
Analysis: When 0 comes after space, then the json path parser changes the
state to JE_SYN instead of PS_Z (meaning parse zero). Hence the warning.
Fix: Make the state PS_Z instead of JE_SYN.