1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00
Commit Graph

990 Commits

Author SHA1 Message Date
Nick Wellnhofer
e91cbcf639 Don't read external entities or XIncludes from stdin
The file input callbacks try to read from stdin if "-" is passed as URL.
This should never be done when loading indirect resources like external
entities or XIncludes. Unfortunately, the stdin substitution happens
deep inside the IO code, so we simply replace "-" with "./-" in specific
locations.

This issue also affects other users of the library like libxslt.
Ideally, stdin should only be substituted on explicit request. But more
intrusive changes could break existing code.

Closes #90 and #102.
2019-09-20 13:26:51 +02:00
Zhipeng Xie
5a02583c7e Fix memory leak in xmlParseBalancedChunkMemoryRecover
When doc is NULL, namespace created in xmlTreeEnsureXMLDecl
is bind to newDoc->oldNs, in this case, set newDoc->oldNs to
NULL and free newDoc will cause a memory leak.

Found with libFuzzer.

Closes #82.
2019-08-26 11:20:49 +02:00
Stephen Chenney
87125732cc Switched from unsigned long to ptrdiff_t in parser.c
Using unsigned long instead of ptrdiff_t results in non-zero
pointer deltas being stored as zero delta, giving incorrect offsets
into arrays and hence out of bounds reads.

This patch fixes the issue in all places in parser.c and adds a macro
to reduce the chances of cut-and-paste errors.

Only affects platforms where 'sizeof(long) < sizeof(size_t)' like
64-bit Windows.

See https://bugs.chromium.org/p/chromium/issues/detail?id=894933

Closes #44.
2019-07-08 13:00:12 +02:00
Nick Wellnhofer
01ea9c5af7 Fix another code path in xmlParseQName
Check for buffer errors in another code path missed in the previous
commit.

Found by OSS-Fuzz.
2019-07-08 11:29:40 +02:00
Nick Wellnhofer
5ccac8cecf Make sure that xmlParseQName returns NULL in error case
If there's an error growing the input buffer when recovering from
invalid QNames, make sure to return NULL. Otherwise, callers could be
confused. In xmlParseStartTag2, for example, `tlen` could become
negative.

Found by OSS-Fuzz.
2019-06-27 10:23:36 +02:00
Nick Wellnhofer
f9fce96313 Fix unsigned integer overflow
It's defined behavior but -fsanitize=unsigned-integer-overflow is
useful to discover bugs.
2019-05-20 13:38:22 +02:00
David Warring
3c0d62b419 Fix parser termination from "Double hyphen within comment" error
The patch fixes the parser not halting immediately when the error
handler attempts to stop the parser.

Rather it was running on and continuing to reference the freed buffer
in the while loop termination test.

This is only a problem if xmlStopParser is called from an error
handler. Probably caused by commit 123234f2. Fixes #58.
2019-05-14 15:55:12 +02:00
Nick Wellnhofer
b48226f78c Fix memory leaks in xmlParseStartTag2 error paths
Found by OSS-Fuzz.
2019-01-07 18:07:00 +01:00
Nick Wellnhofer
8919885ff9 Fix -Wformat-truncation warnings (GCC 8) 2019-01-06 14:24:59 +01:00
Nick Wellnhofer
123234f2cf Free input buffer in xmlHaltParser
This avoids miscalculation of available bytes.

Thanks to Yunho Kim for the report.

Closes: #26
2018-09-11 15:06:17 +02:00
Nick Wellnhofer
707ad080e6 Fix xmlParserEntityCheck
A previous commit removed the check for XML_ERR_ENTITY_LOOP which is
required to abort early in case of excessive entity recursion.
2018-01-23 16:37:54 +01:00
Nick Wellnhofer
ab362ab0ad Halt parser in case of encoding error
Should fix crbug.com/793715, although I wasn't able to reproduce the
issue.
2018-01-22 15:42:26 +01:00
Nick Wellnhofer
60dded12cb Clear entity content in case of errors
This only affects recovery mode and avoids integer overflow in
xmlStringGetNodeList and possibly other nasty surprises.

See bug 783052 and

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3874
https://bugs.chromium.org/p/chromium/issues/detail?id=796804
2018-01-22 15:23:22 +01:00
Nick Wellnhofer
132af1a0d1 Fix buffer over-read in xmlParseNCNameComplex
Calling GROW can halt the parser if the buffer grows too large. This
will set the buffer to an empty string. Return immediately in this case,
otherwise the "current" pointer is advanced leading to a buffer over-read.

Found with OSS-Fuzz. See

https://oss-fuzz.com/testcase?key=6683819592646656
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=5031
2018-01-08 18:48:01 +01:00
Daniel Veillard
ad88b54f1a Improve handling of context input_id
For https://bugzilla.gnome.org/show_bug.cgi?id=772726
This was used in xmlsec to detect issues with accessing external entities
and prevent them, but was unreliable, based on a patch from Aleksey Sanin

* parser.c: make sure input_id is incremented when creating sub-entities
            for parsing or when parsing out of context
2017-12-08 09:42:31 +01:00
Nick Wellnhofer
cb5541c9f3 Fix libz and liblzma detection
If libz or liblzma are detected with pkg-config, AC_CHECK_HEADERS must
not be run because the correct CPPFLAGS aren't set. It is actually not
required have separate checks for LIBXML_ZLIB_ENABLED and HAVE_ZLIB_H.
Only check for LIBXML_ZLIB_ENABLED and remove HAVE_ZLIB_H macro.

Fixes bug 764657, bug 787041.
2017-11-27 14:33:37 +01:00
Nick Wellnhofer
e03f0a199a Fix hash callback signatures
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.

Fixes bug 784861.
2017-11-09 16:42:47 +01:00
Vlad Tsyrklevich
28f52fe89d Refactor name and type signature for xmlNop
Update xmlNop's name to xmlInputReadCallbackNop and its type signature
to match xmlInputReadCallback.

Fixes bug 786134.
2017-11-09 13:43:08 +01:00
Nick Wellnhofer
e3890546d7 Fix the Windows header mess
Don't include windows.h and wsockcompat.h from config.h but only when
needed.

Don't define _WINSOCKAPI_ manually. This was apparently done to stop
windows.h from including winsock.h which is a problem if winsock2.h
wasn't included first. But on MinGW, this causes compiler warnings.
Define WIN32_LEAN_AND_MEAN instead which has the same effect.

Always use the compiler-defined _WIN32 macro instead of WIN32.
2017-10-09 14:35:40 +02:00
Nick Wellnhofer
d422b954be Fix pointer/int cast warnings on 64-bit Windows
On 64-bit Windows, `long` is 32 bits wide and can't hold a pointer.
Switch to ptrdiff_t instead which should be the same size as a pointer
on every somewhat sane platform without requiring C99 types like
intptr_t.

Fixes bug 788312.

Thanks to J. Peter Mugaas for the report and initial patch.
2017-10-09 13:47:49 +02:00
Nick Wellnhofer
b90d8989d3 Fix regression with librsvg
Instead of using xmlCreateIOParserCtxt, librsvg pushes its own
xmlParserInput on top of a memory push parser. This incorrect use of
the API confuses several parser checks and, since 2.9.5, completely
breaks documents with internal subsets. Work around the problem with
internal subsets.

Thanks to Petr Sumbera for the report:

    https://mail.gnome.org/archives/xml/2017-September/msg00011.html

Also see

    https://bugzilla.gnome.org/show_bug.cgi?id=787895
2017-09-19 16:45:49 +02:00
Nick Wellnhofer
abbda93c72 Handle more invalid entity values in recovery mode
In attribute content, don't emit entity references if there are
problems with the entity value. Otherwise some illegal entity values
like

    <!ENTITY a '&#38;#x123456789;'>

would later cause problems like integer overflow.

Make xmlStringLenDecodeEntities return NULL on more error conditions
including invalid char refs and errors from recursive calls. Remove
some fragile error checks based on lastError that shouldn't be
needed now. Clear the entity content in xmlParseAttValueComplex if
an error was found.

Found by OSS-Fuzz. Should fix bug 783052.

Also see https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3343
2017-09-13 17:21:04 +02:00
Nick Wellnhofer
0fcab658a2 Handle illegal entity values in recovery mode
Make xmlParseEntityValue always return NULL on error. Otherwise some
illegal entity values like

    <!ENTITY e '&%#4294967298;'>

would later cause problems like integer overflow.

Found by OSS-Fuzz. Should fix bug 783052.

Also see

    https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=592
    https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=2732
2017-09-07 19:08:43 +02:00
Nick Wellnhofer
69936b129f Revert "Print error messages for truncated UTF-8 sequences"
This reverts commit 79c8a6b which caused a serious regression in
streaming mode.

Also reverts part of commit 52ceced "Fix infinite loops with push
parser in recovery mode".

Fixes bug 786554.
2017-08-30 14:19:06 +02:00
Stéphane Michaut
454e397eb7 Porting libxml2 on zOS encoding of code
First set of patches for zOS
- entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c:
  ask conversion of code to ISO Latin 1 to avoid having the compiler assume
  EBCDIC codepoint for characters.
- xmlmodule.c: make sure we have support for modules
- xmlIO.c: zOS path names are special avoid dsome of the expectstions from
  Unix/Windows
2017-08-28 14:30:43 +02:00
Nick Wellnhofer
899a5d9f0e Detect infinite recursion in parameter entities
When expanding a parameter entity in a DTD, infinite recursion could
lead to an infinite loop or memory exhaustion.

Thanks to Wei Lei for the first of many reports.

Fixes bug 759579.
2017-07-25 15:21:12 +02:00
Nick Wellnhofer
52ceced6e7 Fix infinite loops with push parser in recovery mode
Make sure that the input pointer advances in case of errors. Otherwise,
the push parser can loop infinitely.

Found with libFuzzer.
2017-07-04 18:51:29 +02:00
Nick Wellnhofer
3eef3f39a6 Fix NULL deref in xmlParseExternalEntityPrivate
If called from xmlParseExternalEntity, oldctxt is NULL which leads to
a NULL deref if an error occurs. This only affects external code that
calls xmlParseExternalEntity.

Patch from David Kilzer with minor changes.

Fixes bug 780159.
2017-06-20 16:13:57 +02:00
Nick Wellnhofer
872fea9485 Get rid of "blanks wrapper" for parameter entities
Now that replacement of parameter entities goes exclusively through
xmlSkipBlankChars, we can account for the surrounding space characters
there and remove the "blanks wrapper" hack.
2017-06-20 13:19:47 +02:00
Nick Wellnhofer
d9e43c7db5 Make sure not to call IS_BLANK_CH when parsing the DTD
This is required to get rid of the "blanks wrapper" hack. Checking the
return value of xmlSkipBlankChars is more efficient, too.
2017-06-20 13:19:47 +02:00
Nick Wellnhofer
453dff1e3b Remove unnecessary calls to xmlPopInput
It's enough if xmlPopInput is called from xmlSkipBlankChars. Since the
replacement text of a parameter entity is surrounded with space
characters, that's the only place where the replacement can end in a
well-formed document.

This is also required to get rid of the "blanks wrapper" hack.
2017-06-20 13:19:47 +02:00
Nick Wellnhofer
aa267cd127 Simplify handling of parameter entity references
There are only two places where parameter entity references must be
handled. For the internal subset in xmlParseInternalSubset. For the
external subset or content from other external PEs in xmlSkipBlankChars.

Make sure that xmlSkipBlankChars skips over sequences of PEs and
whitespace. Rely on xmlSkipBlankChars instead of calling
xmlParsePEReference directly when in the external subset or a
conditional section.

xmlParserHandlePEReference is unused now.
2017-06-20 13:19:47 +02:00
Nick Wellnhofer
24246c7626 Fix xmlHaltParser
Pop all extra input streams before resetting the input. Otherwise,
a call to xmlPopInput could make input available again.

Also set input->end to input->cur.

Changes the test output for some error tests. Unfortunately, some
fuzzed test cases were added to the test suite without manual cleanup.
This makes it almost impossible to review the impact of later changes
on the test output.
2017-06-20 13:15:43 +02:00
Nick Wellnhofer
8bbe4508ef Spelling and grammar fixes
Fixes bug 743172, bug 743489, bug 769632, bug 782400 and a few other
misspellings.
2017-06-17 16:34:23 +02:00
Nick Wellnhofer
5f440d8cad Rework entity boundary checks
Make sure to finish all entities in the internal subset. Nevertheless,
readd a sanity check in xmlParseStartTag2 that was lost in my previous
commit. Also add a sanity check in xmlPopInput. Popping an input
unexpectedly was the source of many recent memory bugs. The check
doesn't mitigate such issues but helps with diagnosis.

Always base entity boundary checks on the input ID, not the input
pointer. The pointer could have been reallocated to the old address.

Always throw a well-formedness error if a boundary check fails. In a
few places, a validity error was thrown.

Fix a few error codes and improve indentation.
2017-06-17 13:25:53 +02:00
Nick Wellnhofer
46dc989080 Don't switch encoding for internal parameter entities
This is only needed for external entities. Trying to switch the encoding
for internal entities could also cause a memory leak in recovery mode.
2017-06-17 13:23:40 +02:00
Nick Wellnhofer
03904159f8 Merge duplicate code paths handling PE references
xmlParsePEReference is essentially a subset of
xmlParserHandlePEReference, so make xmlParserHandlePEReference call
xmlParsePEReference. The code paths in these functions differed
slighty, but the code from xmlParserHandlePEReference seems more solid
and tested.
2017-06-17 13:22:37 +02:00
David Kilzer
3f0627a1ee Fix duplicate SAX callbacks for entity content
Reset 'was_checked' to prevent entity from being parsed twice and SAX
callbacks being invoked twice if XML_PARSE_NOENT was set.

This regressed in version 2.9.3 and caused problems with WebKit.

Fixes bug 760367.
2017-06-16 21:30:42 +02:00
Nick Wellnhofer
fb2f518cc6 Fix potential infinite loop in xmlStringLenDecodeEntities
Make sure that xmlParseStringPEReference advances the "str" pointer
even if the parser was stopped. Otherwise xmlStringLenDecodeEntities
can loop infinitely.
2017-06-10 18:00:55 +02:00
Nick Wellnhofer
4ba8cc856b Remove useless check in xmlParseAttributeListDecl
Since we already successfully parsed the attribute name and other
items, it is guaranteed that we made progress in the input stream.

Comparing the input pointer to a previous value also looks fragile to
me. What if the input buffer was reallocated and the new "cur" pointer
happens to be the same as the old one? There are a couple of similar
checks which also take "consumed" into account. This seems to be safer
but I'm not convinced that it couldn't lead to false alarms in rare
situations.
2017-06-10 17:52:59 +02:00
Nick Wellnhofer
bedbef8065 Fix memory leak in xmlParseEntityDecl error path
When parsing the entity value, it can happen that an external entity
with an unsupported encoding is loaded and the parser is stopped. This
would lead to a memory leak.

A custom SAX callback could also stop the parser.

Found with libFuzzer and ASan.
2017-06-10 17:42:52 +02:00
Nick Wellnhofer
030b1f7a27 Revert "Add an XML_PARSE_NOXXE flag to block all entities loading even local"
This reverts commit 2304078555.

The new flag doesn't work and the change even broke the XML_PARSE_NONET
option.
2017-06-06 15:53:42 +02:00
Nick Wellnhofer
e26630548e Fix handling of parameter-entity references
There were two bugs where parameter-entity references could lead to an
unexpected change of the input buffer in xmlParseNameComplex and
xmlDictLookup being called with an invalid pointer.

Percent sign in DTD Names
=========================

The NEXTL macro used to call xmlParserHandlePEReference. When parsing
"complex" names inside the DTD, this could result in entity expansion
which created a new input buffer. The fix is to simply remove the call
to xmlParserHandlePEReference from the NEXTL macro. This is safe because
no users of the macro require expansion of parameter entities.

- xmlParseNameComplex
- xmlParseNCNameComplex
- xmlParseNmtoken

The percent sign is not allowed in names, which are grammatical tokens.

- xmlParseEntityValue

Parameter-entity references in entity values are expanded but this
happens in a separate step in this function.

- xmlParseSystemLiteral

Parameter-entity references are ignored in the system literal.

- xmlParseAttValueComplex
- xmlParseCharDataComplex
- xmlParseCommentComplex
- xmlParsePI
- xmlParseCDSect

Parameter-entity references are ignored outside the DTD.

- xmlLoadEntityContent

This function is only called from xmlStringLenDecodeEntities and
entities are replaced in a separate step immediately after the function
call.

This bug could also be triggered with an internal subset and double
entity expansion.

This fixes bug 766956 initially reported by Wei Lei and independently by
Chromium's ClusterFuzz, Hanno Böck, and Marco Grassi. Thanks to everyone
involved.

xmlParseNameComplex with XML_PARSE_OLD10
========================================

When parsing Names inside an expanded parameter entity with the
XML_PARSE_OLD10 option, xmlParseNameComplex would call xmlGROW via the
GROW macro if the input buffer was exhausted. At the end of the
parameter entity's replacement text, this function would then call
xmlPopInput which invalidated the input buffer.

There should be no need to invoke GROW in this situation because the
buffer is grown periodically every XML_PARSER_CHUNK_SIZE characters and,
at least for UTF-8, in xmlCurrentChar. This also matches the code path
executed when XML_PARSE_OLD10 is not set.

This fixes bugs 781205 (CVE-2017-9049) and 781361 (CVE-2017-9050).
Thanks to Marcel Böhme and Thuan Pham for the report.

Additional hardening
====================

A separate check was added in xmlParseNameComplex to validate the
buffer size.
2017-06-05 18:38:33 +02:00
Nick Wellnhofer
855c19efb7 Avoid reparsing in xmlParseStartTag2
The code in xmlParseStartTag2 must handle the case that the input
buffer was grown and reallocated which can invalidate pointers to
attribute values. Before, this was handled by detecting changes of
the input buffer "base" pointer and, in case of a change, jumping
back to the beginning of the function and reparsing the start tag.

The major problem of this approach is that whether an input buffer is
reallocated is nondeterministic, resulting in seemingly random test
failures. See the mailing list thread "runtest mystery bug: name2.xml
error case regression test" from 2012, for example.

If a reallocation was detected, the code also made no attempts to
continue parsing in case of errors which makes a difference in
the lax "recover" mode.

Now we store the current input buffer "base" pointer for each (not
separately allocated) attribute in the namespace URI field, which isn't
used until later. After the whole start tag was parsed, the pointers
to the attribute values are reconstructed using the offset between the
new and the old input buffer. This relies on arithmetic on dangling
pointers which is technically undefined behavior. But it seems like
the easiest and most efficient fix and a similar approach is used in
xmlParserInputGrow.

This changes the error output of several tests, typically making it
more verbose because we try harder to continue parsing in case of
errors.

(Another possible solution is to check not only the "base" pointer
but the size of the input buffer as well. But this would result in
even more reparsing.)
2017-06-01 14:31:28 +02:00
Nick Wellnhofer
07b7428b69 Simplify control flow in xmlParseStartTag2
Remove some goto labels and deduplicate a bit of code after handling
namespaces.

Before:

    loop {
        parseAttribute
        if (ok) {
            if (defaultNamespace) {
                handleDefaultNamespace
                if (error)
                    goto skip_default_ns;
                handleDefaultNamespace
    skip_default_ns:
                freeAttr
                nextAttr
                continue;
            }
            if (namespace) {
                handleNamespace
                if (error)
                    goto skip_ns;
                handleNamespace
    skip_ns:
                freeAttr
                nextAttr;
                continue;
            }
            handleAttr
        } else {
            freeAttr
        }
        nextAttr
    }

After:

    loop {
        parseAttribute
        if (!ok)
            goto next_attr;
        if (defaultNamespace) {
            handleDefaultNamespace
            if (error)
                goto next_attr;
            handleDefaultNamespace
        } else if (namespace) {
            handleNamespace
            if (error)
                goto next_attr;
            handleNamespace
        } else {
            handleAttr
        }
    next_attr:
        freeAttr
        nextAttr
    }
2017-06-01 14:31:28 +02:00
Nick Wellnhofer
474967241c Avoid spurious UBSan errors in parser.c
If available, use a C99 flexible array member to avoid spurious UBSan
errors.
2017-06-01 14:31:27 +02:00
Nick Wellnhofer
8627e4ed20 Fix memory leak in parser error path
Triggered in mixed content ELEMENT declarations if there's an invalid
name after the first valid name:

    <!ELEMENT para (#PCDATA|a|<invalid>)*>

Found with libFuzzer and ASan.
2017-05-27 15:59:18 +02:00
Neel Mehta
90ccb58242 Prevent unwanted external entity reference
For https://bugzilla.gnome.org/show_bug.cgi?id=780691

* parser.c: add a specific check to avoid PE reference
2017-04-07 17:45:14 +02:00
Doran Moppert
2304078555 Add an XML_PARSE_NOXXE flag to block all entities loading even local
For https://bugzilla.gnome.org/show_bug.cgi?id=772726

* include/libxml/parser.h: Add a new parser flag XML_PARSE_NOXXE
* elfgcchack.h, xmlIO.h, xmlIO.c: associated loading routine
* include/libxml/xmlerror.h: new error raised
* xmllint.c: adds --noxxe flag to activate the option
2017-04-07 16:55:05 +02:00
Daniel Veillard
bdd66182ef Avoid building recursive entities
For https://bugzilla.gnome.org/show_bug.cgi?id=762100

When we detect a recusive entity we should really not
build the associated data, moreover if someone bypass
libxml2 fatal errors and still tries to serialize a broken
entity make sure we don't risk to get ito a recursion

* parser.c: xmlParserEntityCheck() don't build if entity loop
  were found and remove the associated text content
* tree.c: xmlStringGetNodeList() avoid a potential recursion
2016-05-23 15:01:07 +08:00