Remove the libfuzzer max_len option which doesn't apply to other
fuzzing engines. Enforce the maximum length directly in the fuzz
targets. For the xml target, lower the maximum when expanding entities
to avoid timeout and OOM errors.
The function is only used once and its return value is only checked for
zero. Disable the function like its Max counterpart and add an
implementation for the special case.
Found by OSS-Fuzz.
The return type of xmlRegisterCharEncodingHandler() is void. The invoker
cannot determine whether xmlRegisterCharEncodingHandler() is executed
successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the
"handler" is not added to the array "handlers". As a result, the memory
of "handler" cannot be managed and released: memory leakage.
so add "xmlfree(handler)" to fix memory leakage on the failure branch of
xmlRegisterCharEncodingHandler().
Reported-by: wuqing <wuqing30@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
The xmlSchemaGetFacetValueAsUlong() API is an external API.
The validity of external input parameters must be strictly verified.
Before accessing "facet->val->value", we need check whether "facet->val" is
a null pointer.
Signed-off-by: wuqing <wuqing30@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
All the compiler switches essentially set the same macros. The only
exception was MSVC which omitted the "extern" keyword for exported
variables. This in turn broke clang-cl.
This commit rewrites and simplifies the whole header.
Closes#163.
Null bytes in the input stream do not necessarily signal an EOF
condition. Check the stream pointers for EOF to avoid quadratic
rescanning of input data.
Note that the CUR_CHAR macro used in functions like htmlParseCharData
calls htmlCurrentChar which translates null bytes.
Found by OSS-Fuzz.
key/unique/keyref schema attributes currently use qudratic loops
to check their various constraints (that keys are unique and that
keyrefs refer to existing keys). That becomes extremely slow if
there are many elements with keys. This happens in the wild with
e.g. the OVAL XML descriptions of security patches. You need the
openscap schemata, and then an example xml file:
% zypper in openscap-utils
% wget ftp://ftp.suse.com/pub/projects/security/oval/opensuse.leap.15.1.xml
% time xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null
opensuse.leap.15.1.xml validates
real 16m59,857s
user 16m55,787s
sys 0m1,060s
This patch makes libxml use a hash table to avoid the quadratic
behaviour. The existing hash table only accepts strings as keys, so
we're mostly reusing the canonical representation of key values to derive
such strings (with the caveat given in a comment). The alternative
would be to rework the hash table code to accept either numbers or free
functions as hash workers, but the code is fast enough as is.
With the patch we have this then:
% time LD_LIBRARY_PATH=./libxml2/.libs/ ./libxml2/.libs/xmllint --schema /usr/share/openscap/schemas/oval/5.5/oval-definitions-schema.xsd opensuse.leap.15.1.xml > /dev/null
opensuse.leap.15.1.xml validates
real 0m3,531s
user 0m3,427s
sys 0m0,103s
So, a ~300x speedup. This patch survives 'make check' and 'make tests'.
Define PY_SSIZE_T_CLEAN macro in python/libxml.c and cast the string
length (int len) explicitly to Py_ssize_t when passing a string to a
function call using PyObject_CallMethod() with the "s#" format.
The Python extension module now uses Py_ssize_t rather than int for
string lengths. This change makes the extension compatible with
Python 3.10.
Fixes#203.
Found by running the fuzz/uri.c fuzzer under asan (internal Android bug
171610679).
Always free `ret` when exiting on failure. I've moved the definition of
NULLCHK down past where ret is always initialized to make it clear that
this is safe.
This patch also fixes the indentation of two of the NULLCHK call sites
to make it more obvious that NULLCHK isn't `if`-like.
Don't process XIncludes in the result of another inclusion to avoid
infinite recursion resulting in a call stack overflow.
This is something the XInclude engine shouldn't allow but correct
handling of intra-document includes would require major changes.
Found by OSS-Fuzz.
Previously, xmlParseCharData and xmlParseComment would consider 0xA to
be unhandleable when seen as the first byte of an input chunk, and
fall back to xmlParseCharDataComplex and xmlParseCommentComplex, which
have different memory and performance characteristics.
FixesGNOME/libxml2#192
Check parent pointers for NULL after the non-recursive rewrite of the
serialization code. This avoids segfaults with corrupted documents
which can apparently be seen with lxml, see issue #187.
The XML Reader can free text nodes coming from the XInclude engine
before parsing has finished. Cache a copy of the text string, not the
included node to avoid use after free.
Found by OSS-Fuzz.
xml:id creates ID attributes even in documents without a DTD, so the
check in xmlTextReaderFreeProp must be changed to avoid use after free.
Found by OSS-Fuzz.
Keeping objects on a free list can hide memory errors. Only allow a
single node on free lists used by the XML reader when fuzzing. This
should hide fewer errors while still exercising the free list logic.
Always limit nested functions calls to 5000. This avoids call stack
overflows with deeply nested expressions.
The expression parser produces about 10 nested function calls when
parsing a subexpression in parentheses, so the effective nesting limit
is about 500 which should be more than enough.
Use a lower limit when fuzzing to account for increased memory usage
when using sanitizers.
The code wasn't dead after all, but I can see no reason in delaying
the XPointer evaluation. This could lead to nodes included earlier
appearing in XPointer results.