as first character in key
Analysis:
While parsing the path, if '-' is encountered as a part of the key,
the state of the parser changes to error. Hence NULL is returned eventually.
Fix:
If '-' encountered as part of the key, change the state appropriately to
continue scanning the key.
from Item::val_json, UBSAN: member access within null pointer of
type 'struct String' in sql/item_jsonfunc.cc
Analysis:
The first argument of json_schema_valid() needs to be a constant.
Fix:
Parse the schema if the item is constant otherwise set it to return null.
data from a table similar to other JSON functions
Analysis:
Since we are fetching values for every row ( because we are running SELECT
for all rows of a table ), correct value can be only obtained at the time of
calling val_int() because it is called to get value for each row.
Fix:
Set up hash for each row instead of doing it during fixing fields.
The idea is to have simple functions that the user can combine to produce
the exact result one wants, whether the user wants JSON object that has
common keys with another JSON object, or same key/value pair etc. So
making simpler function helps here.
We accomplish this by making three separate functions.
1) JSON_OBJECT_FILTER_KEYS(Obj, Arr_keys):
Put keys ( which are basically strings ) in hash, go over the object and
get key one by one. If the key is present in the hash,
add the key-value pair to result.
2) JSON_OBJECT_TO_ARRAY(Obj) : Create a string variable, Go over the json
object, and add each key value pair as an array into the result.
3) JSON_ARRAY_INTERSECT(arr1, arr2) :
Go over one of the json and add each item of the array
in hash (after normalizing each item). Go over the second array,
search the normalized item one by one in the hash. If item is found,
add it to the result.
Implementation Idea: Holyfoot ( Alexey Botchkov)
Author: tanruixiang and Rucha Deodhar
objects
Idea behind implementation:
We get the json object specified by the json path. Then, transform it into
key-value pairs by going over the json. Get each key-value pair
one-by-one and return the result.
collations
Fix by Alexey Botchkov
The 'value_len' is calculated wrong for the multibyte charsets. In the
read_strn() function we get the length of the string with the final ' " '
character. So have to subtract it's length from the value_len. And the
length of '1' isn't correct for the ucs2 charset (must be 2).
collations
Analysis:
When we have negative index, the value in array_counter[] array is going to
be -1 at some point ( because in case of negative index in json path, the
initial value for a path with negative index is -<size_of_array>, and as we
move forward in array while parsing it and finding path, this value
increments). Since SKIPPED_STEP_MARK, is maximum uint value, it gets
compared to some int value in the array and eventually equates to -1
and messes with path.
Fix:
Make SKIPPED_STEP_MARK maximum of INT32.
Analysis: null_value is not set if any one of the arguments is NULL. So it
returns 1.
Fix: when either argument is NULL, set null_value to true, so that null can
be returned
starts with '['
Analysis:
When type is non-scalar and the json document has syntax error
then it is not detected during validating type. And Since other validate
functions take const argument, the error state is not stored eventually.
Fix:
After we run out of all schemas (in case of no error during validation) from
the schema list, go over the json document until there is error in parsing
or json doc has ended.
json_normalize_number(): Avoid accessing str past str_len.
The function would seem to work incorrectly when some digits are
not followed by a decimal point (.) or an exponent (E or e).
object of type 'Item_string' in sql/json_schema.cc
Analysis: make_string_literal() returns pointer of type
Item_basic_constant which is converted to pointer of type Item_string. Now,
Item_string is base class of Item_basic_constant, so the error about
downcasting.
Fix: using constructor of Item_string type directly instead of
downcasting would be more appropriate.
always return 1
Analysis: Implementation used double to store value. longlong is better
choice
Fix: Use longlong for multiple_of and the value to store it and num_flag to
check for decimals.
Analysis: multipleOf must be strictly greater then 0. However the
implementation only disallowed values strcitly less than 0
Fix: check if value is less than or equal to 0 instead of less than 0 and
return true.
Also fixed the incorrect return value for some other keywords.
Analysis: Current implementation does not check the number of elements in
the enum array and whether they are unique or not.
Fix: Add a counter that counts number of elements and before inserting the
element in the enum hash check whether it exists.
input json schema
Analysis: In case of syntax error while scanning json schema, true is
returned inspite of it being wanring and not error.
Fix: return true instead of false.
unevaluatedProperties with properties declared in subschemas
Analysis:
When a key fails to validate for "properties" when "properties" is being
treated as alternate schema, it needs to fall back on alternate schema
for "properites" itself ("unevaluatedProperties" in context of the bug).
But that doesn't happen and we end up returning false (=validated)
Fix:
When "properties" fails to validate as an alternate schema, fall back on
alternate schema for "properties" itself.
regex
Analysis:
When initializing Regexp_processor_pcre object, we set the PCRE2_CASELESS
flag which is responsible for case insensitive comparison.
Fix:
Unset the flag after initializing.
comment 2)
Analysis:
flag to check unique gets reset every time. Hence unique values cannot be
correctly checked
Fix:
do not reset the flag that checks unique for null, true and false
values.
comment 3)
Analysis:
current implementation checks for appropriate value but does not
return true.
Fix:
return true on error
comment 4)
Analysis:
Current implementation did not check for value type for values
inside required array.
Fix:
Check values inside required array
Implementation:
Implementation is made according to json schema validation draft 2020
JSON schema basically has same structure as that of json object, consisting
of key-value pairs. So it can be parsed in the same manner as
any json object.
However, none of the keywords are mandatory, so making guess about the
json value type based only on the keywords would be incorrect.
Hence we need separate objects denoting each keyword.
So during create_object_and_handle_keyword() we create appropriate objects
based on the keywords and validate each of them individually on the json
document by calling respective validate() function if the type matches.
If any of them fails, return false, else return true.