If called from xmlParseExternalEntity, oldctxt is NULL which leads to
a NULL deref if an error occurs. This only affects external code that
calls xmlParseExternalEntity.
Patch from David Kilzer with minor changes.
Fixes bug 780159.
Now that replacement of parameter entities goes exclusively through
xmlSkipBlankChars, we can account for the surrounding space characters
there and remove the "blanks wrapper" hack.
It's enough if xmlPopInput is called from xmlSkipBlankChars. Since the
replacement text of a parameter entity is surrounded with space
characters, that's the only place where the replacement can end in a
well-formed document.
This is also required to get rid of the "blanks wrapper" hack.
There are only two places where parameter entity references must be
handled. For the internal subset in xmlParseInternalSubset. For the
external subset or content from other external PEs in xmlSkipBlankChars.
Make sure that xmlSkipBlankChars skips over sequences of PEs and
whitespace. Rely on xmlSkipBlankChars instead of calling
xmlParsePEReference directly when in the external subset or a
conditional section.
xmlParserHandlePEReference is unused now.
Pop all extra input streams before resetting the input. Otherwise,
a call to xmlPopInput could make input available again.
Also set input->end to input->cur.
Changes the test output for some error tests. Unfortunately, some
fuzzed test cases were added to the test suite without manual cleanup.
This makes it almost impossible to review the impact of later changes
on the test output.
Make sure to finish all entities in the internal subset. Nevertheless,
readd a sanity check in xmlParseStartTag2 that was lost in my previous
commit. Also add a sanity check in xmlPopInput. Popping an input
unexpectedly was the source of many recent memory bugs. The check
doesn't mitigate such issues but helps with diagnosis.
Always base entity boundary checks on the input ID, not the input
pointer. The pointer could have been reallocated to the old address.
Always throw a well-formedness error if a boundary check fails. In a
few places, a validity error was thrown.
Fix a few error codes and improve indentation.
xmlParsePEReference is essentially a subset of
xmlParserHandlePEReference, so make xmlParserHandlePEReference call
xmlParsePEReference. The code paths in these functions differed
slighty, but the code from xmlParserHandlePEReference seems more solid
and tested.
Reset 'was_checked' to prevent entity from being parsed twice and SAX
callbacks being invoked twice if XML_PARSE_NOENT was set.
This regressed in version 2.9.3 and caused problems with WebKit.
Fixes bug 760367.
Make sure that xmlParseStringPEReference advances the "str" pointer
even if the parser was stopped. Otherwise xmlStringLenDecodeEntities
can loop infinitely.
Since we already successfully parsed the attribute name and other
items, it is guaranteed that we made progress in the input stream.
Comparing the input pointer to a previous value also looks fragile to
me. What if the input buffer was reallocated and the new "cur" pointer
happens to be the same as the old one? There are a couple of similar
checks which also take "consumed" into account. This seems to be safer
but I'm not convinced that it couldn't lead to false alarms in rare
situations.
When parsing the entity value, it can happen that an external entity
with an unsupported encoding is loaded and the parser is stopped. This
would lead to a memory leak.
A custom SAX callback could also stop the parser.
Found with libFuzzer and ASan.
There were two bugs where parameter-entity references could lead to an
unexpected change of the input buffer in xmlParseNameComplex and
xmlDictLookup being called with an invalid pointer.
Percent sign in DTD Names
=========================
The NEXTL macro used to call xmlParserHandlePEReference. When parsing
"complex" names inside the DTD, this could result in entity expansion
which created a new input buffer. The fix is to simply remove the call
to xmlParserHandlePEReference from the NEXTL macro. This is safe because
no users of the macro require expansion of parameter entities.
- xmlParseNameComplex
- xmlParseNCNameComplex
- xmlParseNmtoken
The percent sign is not allowed in names, which are grammatical tokens.
- xmlParseEntityValue
Parameter-entity references in entity values are expanded but this
happens in a separate step in this function.
- xmlParseSystemLiteral
Parameter-entity references are ignored in the system literal.
- xmlParseAttValueComplex
- xmlParseCharDataComplex
- xmlParseCommentComplex
- xmlParsePI
- xmlParseCDSect
Parameter-entity references are ignored outside the DTD.
- xmlLoadEntityContent
This function is only called from xmlStringLenDecodeEntities and
entities are replaced in a separate step immediately after the function
call.
This bug could also be triggered with an internal subset and double
entity expansion.
This fixes bug 766956 initially reported by Wei Lei and independently by
Chromium's ClusterFuzz, Hanno Böck, and Marco Grassi. Thanks to everyone
involved.
xmlParseNameComplex with XML_PARSE_OLD10
========================================
When parsing Names inside an expanded parameter entity with the
XML_PARSE_OLD10 option, xmlParseNameComplex would call xmlGROW via the
GROW macro if the input buffer was exhausted. At the end of the
parameter entity's replacement text, this function would then call
xmlPopInput which invalidated the input buffer.
There should be no need to invoke GROW in this situation because the
buffer is grown periodically every XML_PARSER_CHUNK_SIZE characters and,
at least for UTF-8, in xmlCurrentChar. This also matches the code path
executed when XML_PARSE_OLD10 is not set.
This fixes bugs 781205 (CVE-2017-9049) and 781361 (CVE-2017-9050).
Thanks to Marcel Böhme and Thuan Pham for the report.
Additional hardening
====================
A separate check was added in xmlParseNameComplex to validate the
buffer size.
The code in xmlParseStartTag2 must handle the case that the input
buffer was grown and reallocated which can invalidate pointers to
attribute values. Before, this was handled by detecting changes of
the input buffer "base" pointer and, in case of a change, jumping
back to the beginning of the function and reparsing the start tag.
The major problem of this approach is that whether an input buffer is
reallocated is nondeterministic, resulting in seemingly random test
failures. See the mailing list thread "runtest mystery bug: name2.xml
error case regression test" from 2012, for example.
If a reallocation was detected, the code also made no attempts to
continue parsing in case of errors which makes a difference in
the lax "recover" mode.
Now we store the current input buffer "base" pointer for each (not
separately allocated) attribute in the namespace URI field, which isn't
used until later. After the whole start tag was parsed, the pointers
to the attribute values are reconstructed using the offset between the
new and the old input buffer. This relies on arithmetic on dangling
pointers which is technically undefined behavior. But it seems like
the easiest and most efficient fix and a similar approach is used in
xmlParserInputGrow.
This changes the error output of several tests, typically making it
more verbose because we try harder to continue parsing in case of
errors.
(Another possible solution is to check not only the "base" pointer
but the size of the input buffer as well. But this would result in
even more reparsing.)
Triggered in mixed content ELEMENT declarations if there's an invalid
name after the first valid name:
<!ELEMENT para (#PCDATA|a|<invalid>)*>
Found with libFuzzer and ASan.
For https://bugzilla.gnome.org/show_bug.cgi?id=772726
* include/libxml/parser.h: Add a new parser flag XML_PARSE_NOXXE
* elfgcchack.h, xmlIO.h, xmlIO.c: associated loading routine
* include/libxml/xmlerror.h: new error raised
* xmllint.c: adds --noxxe flag to activate the option
For https://bugzilla.gnome.org/show_bug.cgi?id=762100
When we detect a recusive entity we should really not
build the associated data, moreover if someone bypass
libxml2 fatal errors and still tries to serialize a broken
entity make sure we don't risk to get ito a recursion
* parser.c: xmlParserEntityCheck() don't build if entity loop
were found and remove the associated text content
* tree.c: xmlStringGetNodeList() avoid a potential recursion
For https://bugzilla.gnome.org/show_bug.cgi?id=759573
* parser.c:
(xmlParseElementDecl): Return early on invalid input to fix
non-minimized test case (759573-2.xml). Otherwise the parser
gets into a bad state in SKIP(3) at the end of the function.
(xmlParseConditionalSections): Halt parsing when hitting invalid
input that would otherwise caused xmlParserHandlePEReference()
to recurse unexpectedly. This fixes the minimized test case
(759573.xml).
* result/errors/759573-2.xml: Add.
* result/errors/759573-2.xml.err: Add.
* result/errors/759573-2.xml.str: Add.
* result/errors/759573.xml: Add.
* result/errors/759573.xml.err: Add.
* result/errors/759573.xml.str: Add.
* test/errors/759573-2.xml: Add.
* test/errors/759573.xml: Add.
For https://bugzilla.gnome.org/show_bug.cgi?id=759020
* parser.c:
(xmlParseStartTag2): Attribute strings are only valid if the
base does not change, so add another check where the base may
change. Make sure to set 'attvalue' to NULL after freeing it.
* result/errors/759020.xml: Added.
* result/errors/759020.xml.err: Added.
* result/errors/759020.xml.str: Added.
* test/errors/759020.xml: Added test case.
For https://bugzilla.gnome.org/show_bug.cgi?id=761430
libfuzzer regression testing exposed another case where the parser would
fetch content of an external entity while not in validating mode.
Plug that hole
* parser.c:
(xmlParseNCNameComplex): Store start position instead of a
pointer to the name since the underlying buffer may change,
resulting in a stale pointer being used.
* result/errors/759398.xml: Added.
* result/errors/759398.xml.err: Added.
* result/errors/759398.xml.str: Added.
* test/errors/759398.xml: Added test case.
* parser.c:
(xmlParseEndTag2): Add bounds checks before dereferencing
ctxt->input->cur past the end of the buffer, or incrementing the
pointer past the end of the buffer.
* result/errors/758588.xml: Add test result.
* result/errors/758588.xml.err: Ditto.
* result/errors/758588.xml.str: Ditto.
* test/errors/758588.xml: Add regression test.
For https://bugzilla.gnome.org/show_bug.cgi?id=765207
CVE-2016-3705
The functions xmlParserEntityCheck() and xmlParseAttValueComplex() used to call
xmlStringDecodeEntities() in a recursive context without incrementing the
'depth' counter in the parser context. Because of that omission, the parser
failed to detect attribute recursions in certain documents before running out
of stack space.
* parser.c:
(xmlCheckCdataPush): Add 'complete' argument to describe whether
the buffer passed in is the whole CDATA buffer, or if there is
more data to parse. If there is more data to parse, don't
return a negative value for an invalid multi-byte UTF-8
character that is split between buffers.
(xmlParseTryOrFinish): Pass 'complete' argument to
xmlCheckCdataPush() as appropriate.
* result/cdata-2-byte-UTF-8.xml: Added.
* result/cdata-2-byte-UTF-8.xml.rde: Added.
* result/cdata-2-byte-UTF-8.xml.rdr: Added.
* result/cdata-2-byte-UTF-8.xml.sax: Added.
* result/cdata-2-byte-UTF-8.xml.sax2: Added.
* result/cdata-3-byte-UTF-8.xml: Added.
* result/cdata-3-byte-UTF-8.xml.rde: Added.
* result/cdata-3-byte-UTF-8.xml.rdr: Added.
* result/cdata-3-byte-UTF-8.xml.sax: Added.
* result/cdata-3-byte-UTF-8.xml.sax2: Added.
* result/cdata-4-byte-UTF-8.xml: Added.
* result/cdata-4-byte-UTF-8.xml.rde: Added.
* result/cdata-4-byte-UTF-8.xml.rdr: Added.
* result/cdata-4-byte-UTF-8.xml.sax: Added.
* result/cdata-4-byte-UTF-8.xml.sax2: Added.
* result/noent/cdata-2-byte-UTF-8.xml: Added.
* result/noent/cdata-3-byte-UTF-8.xml: Added.
* result/noent/cdata-4-byte-UTF-8.xml: Added.
* test/cdata-2-byte-UTF-8.xml: Added.
* test/cdata-3-byte-UTF-8.xml: Added.
* test/cdata-4-byte-UTF-8.xml: Added.
- Add tests and results. Only 'make Readertests XMLPushtests'
fails prior to the fix.
For https://bugzilla.gnome.org/show_bug.cgi?id=759671
when the end of the internal subset isn't properly detected
xmlParseInternalSubset should just return instead of trying
to process input further.
For https://bugzilla.gnome.org/show_bug.cgi?id=756525
handle properly the case where we popped out of the current entity
while processing a start tag
Reported by Kostya Serebryany @ Google
This slightly modifies the output of 754946 in regression tests
Unify the various place where either xmlStopParser was called
(which resets the error as a side effect) and places where we
used ctxt->instate = XML_PARSER_EOF to stop further processing
The problem is doing it in a consistent and safe fashion
It's more complex than just setting ctxt->instate = XML_PARSER_EOF
Update the public function to reuse that new internal routine
For https://bugzilla.gnome.org/show_bug.cgi?id=756733
It is one case where the code in place to detect entities expansions
failed to exit when the situation was detected, leading to DoS
Problem reported by Kostya Serebryany @ Google
Patch provided by David Drysdale @ Google
For https://bugzilla.gnome.org/show_bug.cgi?id=756527
and was also raised by Chromium team in the past
When we hit a convwersion failure when switching encoding
it is bestter to stop parsing there, this was treated as a
fatal error but the parser was continuing to process to extract
more errors, unfortunately that makes little sense as the data
is obviously corrupt and can potentially lead to unexpected behaviour.
an off by one mistake in the change, led to error on correct
document where the end of the included entity was exactly
the end of the conditional section, leading to regtest failure
Which happen after the previous fix to
https://bugzilla.gnome.org/show_bug.cgi?id=756456
But stopping the parser and exiting we didn't pop the intermediary entities
and doing the SKIP there applies on an input which may be too small
For https://bugzilla.gnome.org/show_bug.cgi?id=754946
When hitting the end of the current input buffer while parsing
a name we could end up loosing the beginning of the name, which
led to various issues.
For https://bugzilla.gnome.org/show_bug.cgi?id=751631
If we fail conversing the current input stream while
processing the encoding declaration of the XMLDecl
then it's safer to just abort there and not try to
report further errors.
For https://bugzilla.gnome.org/show_bug.cgi?id=744980
The error handling of Conditional Section also need to be
straightened as the structure of the document can't be
guessed on a failure there and it's better to stop parsing
as further errors are likely to be irrelevant.