The code in xmlParseStartTag2 must handle the case that the input
buffer was grown and reallocated which can invalidate pointers to
attribute values. Before, this was handled by detecting changes of
the input buffer "base" pointer and, in case of a change, jumping
back to the beginning of the function and reparsing the start tag.
The major problem of this approach is that whether an input buffer is
reallocated is nondeterministic, resulting in seemingly random test
failures. See the mailing list thread "runtest mystery bug: name2.xml
error case regression test" from 2012, for example.
If a reallocation was detected, the code also made no attempts to
continue parsing in case of errors which makes a difference in
the lax "recover" mode.
Now we store the current input buffer "base" pointer for each (not
separately allocated) attribute in the namespace URI field, which isn't
used until later. After the whole start tag was parsed, the pointers
to the attribute values are reconstructed using the offset between the
new and the old input buffer. This relies on arithmetic on dangling
pointers which is technically undefined behavior. But it seems like
the easiest and most efficient fix and a similar approach is used in
xmlParserInputGrow.
This changes the error output of several tests, typically making it
more verbose because we try harder to continue parsing in case of
errors.
(Another possible solution is to check not only the "base" pointer
but the size of the input buffer as well. But this would result in
even more reparsing.)
The API tests combine string buffers with arbitrary length values which
makes ASan detect out-of-bound array accesses. Even without ASan, this
could lead to unwanted test failures.
Add a check for "len", "size", and "start" arguments, assuming they
apply to the nearest char pointer. Skip the test if they exceed the
buffer size. This is a somewhat naive heuristic but it seems to work
well.
Don't count leading zeros towards the fraction size limit. This allows
to parse numbers like
0.0000000000000000000000000000000000000000000000000000000001
which is the only standard-conformant way to represent such numbers, as
scientific notation isn't allowed in XPath 1.0. (It is allowed in XPath
2.0 and in libxml2 as an extension, though.)
Overall accuracy is still bad, see bug 783238.
Use the C library's floor and ceil functions. The old code was overly
complicated for no apparent reason and could result in undefined
behavior when handling NaNs (found with afl-fuzz and UBSan).
Fix wrong comment in xmlXPathRoundFunction. The implementation was
already following the spec and rounding half up.
When traversing the "preceding" axis from an attribute node, we must
first go up to the attribute's containing element. Otherwise, text
children of other attributes could be returned. This made it possible
to hit a code path in xmlXPathNextAncestor which contained another bug:
The attribute node was initialized with the context node instead of the
current node. Normally, this code path is only hit via
xmlXPathNextAncestorOrSelf in which case the current and context node
are the same.
The combination of the two bugs could result in an infinite loop, found
with libFuzzer.
Traversing the "following" and the "preceding" axis from namespace nodes
should be handled similarly. This wasn't supported at all previously.
Move the check for trailing characters from xmlXPathEval to
xmlXPathEvalExpr. Otherwise, a valid portion of a syntactically invalid
expression would be evaluated before returning an error.
Move cleanup of XPath stack to xmlXPathFreeParserContext. This avoids
memory leaks if valuePop fails in some error cases. Found with
libFuzzer and ASan.
Rework handling of the final XPath result object in
xmlXPathCompiledEvalInternal and xmlXPathEval to avoid useless error
messages.
Triggered in mixed content ELEMENT declarations if there's an invalid
name after the first valid name:
<!ELEMENT para (#PCDATA|a|<invalid>)*>
Found with libFuzzer and ASan.
For https://bugzilla.gnome.org/show_bug.cgi?id=772726
* include/libxml/parser.h: Add a new parser flag XML_PARSE_NOXXE
* elfgcchack.h, xmlIO.h, xmlIO.c: associated loading routine
* include/libxml/xmlerror.h: new error raised
* xmllint.c: adds --noxxe flag to activate the option
Namespace nodes must be copied to avoid use-after-free errors.
But they don't necessarily have a physical representation in a
document, so simply disallow them in XPointer ranges.
Found with afl-fuzz.
Fixes CVE-2016-4658.
The old code would invoke the broken xmlXPtrRangeToFunction. range-to
isn't really a function but a special kind of location step. Remove
this function and always handle range-to in the XPath code.
The old xmlXPtrRangeToFunction could also be abused to trigger a
use-after-free error with the potential for remote code execution.
Found with afl-fuzz.
Fixes CVE-2016-5131.
For https://bugzilla.gnome.org/show_bug.cgi?id=766834
vctxt->parserCtxt is always NULL in xmlSchemaSAXHandleStartElementNs,
so this function can't call xmlStringLenDecodeEntities to decode the
entities.
For https://bugzilla.gnome.org/show_bug.cgi?id=762100
When we detect a recusive entity we should really not
build the associated data, moreover if someone bypass
libxml2 fatal errors and still tries to serialize a broken
entity make sure we don't risk to get ito a recursion
* parser.c: xmlParserEntityCheck() don't build if entity loop
were found and remove the associated text content
* tree.c: xmlStringGetNodeList() avoid a potential recursion
For https://bugzilla.gnome.org/show_bug.cgi?id=758606
* parserInternals.c:
(xmlNextChar): Add an test to catch other issues on ctxt->input
corruption proactively.
For non-UTF-8 charsets, xmlNextChar() failed to check for the end
of the input buffer and would continuing reading. Fix this by
pulling out the check for the end of the input buffer into common
code, and return if we reach the end of the input buffer
prematurely.
* result/HTML/758606.html: Added.
* result/HTML/758606.html.err: Added.
* result/HTML/758606.html.sax: Added.
* result/HTML/758606_2.html: Added.
* result/HTML/758606_2.html.err: Added.
* result/HTML/758606_2.html.sax: Added.
* test/HTML/758606.html: Added test case.
* test/HTML/758606_2.html: Added test case.
For https://bugzilla.gnome.org/show_bug.cgi?id=759573
* parser.c:
(xmlParseElementDecl): Return early on invalid input to fix
non-minimized test case (759573-2.xml). Otherwise the parser
gets into a bad state in SKIP(3) at the end of the function.
(xmlParseConditionalSections): Halt parsing when hitting invalid
input that would otherwise caused xmlParserHandlePEReference()
to recurse unexpectedly. This fixes the minimized test case
(759573.xml).
* result/errors/759573-2.xml: Add.
* result/errors/759573-2.xml.err: Add.
* result/errors/759573-2.xml.str: Add.
* result/errors/759573.xml: Add.
* result/errors/759573.xml.err: Add.
* result/errors/759573.xml.str: Add.
* test/errors/759573-2.xml: Add.
* test/errors/759573.xml: Add.
For https://bugzilla.gnome.org/show_bug.cgi?id=759020
* parser.c:
(xmlParseStartTag2): Attribute strings are only valid if the
base does not change, so add another check where the base may
change. Make sure to set 'attvalue' to NULL after freeing it.
* result/errors/759020.xml: Added.
* result/errors/759020.xml.err: Added.
* result/errors/759020.xml.str: Added.
* test/errors/759020.xml: Added test case.
For https://bugzilla.gnome.org/show_bug.cgi?id=760263
* HTMLparser.c: Add BASE_PTR convenience macro.
(htmlParseSystemLiteral): Store length and start position instead
of a pointer while iterating through the public identifier since
the underlying buffer may change, resulting in a stale pointer
being used.
(htmlParsePubidLiteral): Ditto.
From https://bugzilla.gnome.org/show_bug.cgi?id=758518
Happens when a file has a name getting parsed, but no valid encoding
set, so libxml has to guess what the encoding is. This patch detects
when the buffer location changes, and if it does, restarts the parsing
of the name.
This slightly change a couple of regression tests output
For https://bugzilla.gnome.org/show_bug.cgi?id=761430
libfuzzer regression testing exposed another case where the parser would
fetch content of an external entity while not in validating mode.
Plug that hole