1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-28 10:01:00 +03:00
Commit Graph

368 Commits

Author SHA1 Message Date
Nick Wellnhofer
9edc20c154 Fix double counting of CRLF in comments
Fixes #151.
2022-02-07 20:54:07 +01:00
Nick Wellnhofer
5408c10c37 Don't normalize namespace URIs in XPointer xmlns() scheme
Namespace URIs should be compared without escaping or unescaping:

https://www.w3.org/TR/REC-xml-names/#NSNameComparison

Fixes #289.
2022-02-04 14:00:09 +01:00
Nick Wellnhofer
1c7d91abe4 Fix handling of XSD with empty namespace
An empty namespace means no default namespace.

Fixes #303.
2022-02-03 23:31:19 +01:00
Nick Wellnhofer
f480f7509c Update NewsML DTD in test suite
Switch to version 1.2 which has a clearer license.

Fixes #291.
2022-02-03 14:43:17 +01:00
Nick Wellnhofer
d85245f934 Fix regression with PEs in external DTD
Fix a regression introduced with commit a28f7d87. In some cases,
parameter entity references in external DTDs wouldn't be expanded.

Fixes #306.
2022-01-16 21:56:10 +01:00
David Kilzer
03bb929390 Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk
This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8().

* encoding.c:
(UTF16LEToUTF8):
- Fix comment to describe what the code does.
(UTF16BEToUTF8):
- Fix undefined behavior which was applied to UTF16LEToUTF8() in
  2f9382033e.
- Add bounds check to while() loop which was applied to
  UTF16LEToUTF8() in be803967db.
- Do not return -2 when (in >= inend) to fix the bug.  This was
  applied to UTF16LEToUTF8() in 496a1cf592.
- Inline (<< 8) statements to match UTF16LEToUTF8().

Add the following tests and results:

  test/text-4-byte-UTF-16-BE-offset.xml
  test/text-4-byte-UTF-16-BE.xml
  test/text-4-byte-UTF-16-LE-offset.xml
  test/text-4-byte-UTF-16-LE.xml
2022-01-16 14:07:17 +01:00
Nick Wellnhofer
2732b23466 Fix regression parsing public IDs literals in HTML
Fix regression introduced when reworking htmlParsePubidLiteral in
commit 93ce33c2.

Fixes #318.
2022-01-10 13:37:59 +01:00
Nick Wellnhofer
01411e7c5e Check for invalid redeclarations of predefined entities
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.

Note that some test cases declared

    <!ENTITY lt "<">

But the XML spec clearly states that this is illegal:

> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.

Also fixes #217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.

As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
2021-02-08 21:51:26 +01:00
Mike Dalessio
e28d9347bc add test coverage for incorrectly-closed comments
this establishes the baseline behavior so that subsequent commits
which modify this behavior are clear about what's being changed.
2020-12-16 16:12:07 +01:00
Nick Wellnhofer
87d20b554c Fix regression introduced with commit 74dcc10b
The code wasn't dead after all, but I can see no reason in delaying
the XPointer evaluation. This could lead to nodes included earlier
appearing in XPointer results.
2020-08-19 13:52:08 +02:00
Nick Wellnhofer
d88df4bd48 Fix corner case with empty xi:fallback
xi:fallback could become empty after recursive expansion. Use a flag
to track whether nodes should be skipped.
2020-08-17 01:17:39 +02:00
Nick Wellnhofer
1abf2967f9 Fix exponential runtime and memory in xi:fallback processing
When creating XML_XINCLUDE_START nodes, the children of the original
xi:include node must be freed, otherwise fallback content is copied
twice, doubling runtime and memory consumption for each nested
xi:fallback/xi:include pair.

Found with libFuzzer.
2020-08-07 19:59:07 +02:00
Nick Wellnhofer
0f9817c75b Don't recurse into xi:include children in xmlXIncludeDoProcess
Otherwise, nested xi:include nodes might result in a use-after-free
if XML_PARSE_NOXINCNODE is specified.

Found with libFuzzer and ASan.
2020-08-06 14:29:33 +02:00
David Kilzer
6b4717d61d Add regexp regression tests
- Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup
  <https://bugzilla.gnome.org/show_bug.cgi?id=757711>
- Bug 783015 - Integer-overflow in xmlFAParseQuantExact
  <https://bugzilla.gnome.org/show_bug.cgi?id=783015>

(Regexptests): Add support for checking stderr output when
running regexp tests.  This makes it possible to check in test
cases that fail and not see false-positive error output when
running the tests.  Unlike other libxml2 test suites, if there
is no stderr output, no *.err file needs to be created.
2020-07-06 12:37:53 +02:00
Nick Wellnhofer
477c7f6aff Fix quadratic runtime in HTML parser
Commit eeb99329 removed an important optimization avoiding quadratic
runtime when repeatedly scanning the input buffer for terminating
characters in the HTML push parser. The related bug is

    https://bugzilla.gnome.org/show_bug.cgi?id=444994

Make sure that ctxt->checkIndex is always written and store additional
parser state in ctxt->inSubset which is unused in the HTML parser.

Found by OSS-Fuzz.
2020-07-06 12:17:20 +02:00
Nick Wellnhofer
32cb5dccda Add test case for recursive external parsed entities 2020-02-11 17:36:43 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
c51e38cb3a Make xmlParseConditionalSections non-recursive
Avoid call stack overflow in deeply nested conditional sections.

Found by OSS-Fuzz.
2019-09-30 15:47:30 +02:00
Nick Wellnhofer
c2b0a184a9 Fix empty branch in regex
Fixes bug 649244:
https://bugzilla.gnome.org/show_bug.cgi?id=649244

Closes #57.
2019-09-25 14:22:47 +02:00
Nick Wellnhofer
6705f4d28e Remove executable bit from non-executable files 2019-09-16 15:48:59 +02:00
Nick Wellnhofer
e8c9cd5c7a Fix Schema determinism check of ##other namespaces
Non-compound (##local) and compound string atoms are always disjoint
regardless of whether the compound atom is negated (##other).

Closes #40.
2019-09-16 15:36:02 +02:00
Nick Wellnhofer
8efc5b283c 14:00 is a valid timezone for xs:dateTime
Closes #100
2019-09-13 12:24:23 +02:00
Jan Pokorný
ea695ac0d6 Fix unability to RelaxNG-validate grammar with choice-based name class
Previously, test/relaxng/ambig_name-class2.xml would fail to validate
against test/relaxng/ambig_name-class2.rng:

> test/relaxng/ambig_name-class2.rng:4:
>   element attribute: Relax-NG parser error :
>       Found anyName attribute without oneOrMore ancestor
> Relax-NG schema test/relaxng/ambig_name-class2.rng failed to compile

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
2019-08-25 13:29:04 +02:00
Jan Pokorný
8074b88179 Fix unability to validate ambiguously constructed interleave for RelaxNG
Previously, test/relaxng/ambig_name-class.xml would fail to validate
for a simple reason -- interleave within "open-name-class" context
is supposed to be fine with whatever else is pending the consumption,
since effectively, it's unrelated from a higher parsing perspective.

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
2019-08-25 13:29:04 +02:00
Nick Wellnhofer
c2f4da1a93 Improve XPath predicate and filter evaluation
Consolidate code paths evaluating XPath predicates and filters.

Don't push context node on stack when evaluating predicates. I have no
idea why this was done. It seems completely useless and trying to pop
the context node from a corrupted stack has already caused security
issues.

Filter nodesets in-place and don't create node sets with NULL gaps which
allows to simplify merging a great deal. Simply move matched nodes
backward and create a compact node set.

Merge xmlXPathCompOpEvalPositionalPredicate into
xmlXPathCompOpEvalPredicate.
2019-04-22 14:48:46 +02:00
Nick Wellnhofer
30a6533e01 Fix float casts in xmlXPathSubstringFunction
Rewrite conversion of double to int in xmlXPathSubstringFunction, adding
range checks to avoid undefined behavior. Make sure to add start and
length as floating-point numbers before converting to int. Fix a bug
when rounding negative start indices.

Remove unneeded calls to xmlXPathIs{Inf,NaN} and rely on IEEE math
instead. Avoid computing the string length. xmlUTF8Strsub works as
expected if the length of the requested substring exceeds the input.

Found with libFuzzer and UBSan.
2019-03-08 14:29:59 +01:00
Nikolai Weibull
c64d4efb31 Remove redefined starts and defines inside include elements
When including a grammar from another grammar, we need to make sure that any
redefines of starts and includes that that grammar does inside any of its
include elements are also removed.
2018-11-29 21:06:06 +01:00
Nikolai Weibull
46da8fc529 Allow choice within choice in nameClass in RELAX NG
The pattern nameClass allows for nested choice elements, for example

  <name>
    <choice>
      <choice>
        <name>a</name>
        <name>b</name>
      </choice>
      <name>c</name>
    </choice>
  </name>

which is semantically equivalent to

  <name>
    <choice>
      <name>a</name>
      <name>b</name>
      <name>c</name>
    </choice>
  </name>

The old code didn’t handle this correctly, as it never expected a choice inside
another choice.  This patch fixes this by flattening any nested choices.

This pattern of nested choice elements comes up in RELAX NG simplification,
where all choice elements are rewritten in this nested manner, see section 4.12
of the RELAX NG specification.
2018-11-29 21:03:11 +01:00
Nikolai Weibull
4338c310eb Look inside divs for starts and defines inside include
RELAX NG allows for div elements inside of include elements.  We need to look
inside those div elements for start and define elements that may be redefining
start and define elements in the included grammar.
2018-11-29 21:00:46 +01:00
Nick Wellnhofer
7218255092 Add test for ICU flush and pivot buffer 2017-11-04 15:38:58 +01:00
Nick Wellnhofer
5af594d8bc Fix comparison of nodesets to strings
Fix two bugs in xmlXPathNodeValHash which could lead to errors when
comparing nodesets to strings:

- Only use contents of text nodes to compute the hash for element nodes.
  Comments, PIs, and other node types don't affect the string-value and
  must be ignored.
- Reset `string` to NULL for node types other than text.

Reported by Aleksei on the mailing list:

    https://mail.gnome.org/archives/xml/2017-September/msg00016.html
2017-10-07 15:22:57 +02:00
Nick Wellnhofer
69936b129f Revert "Print error messages for truncated UTF-8 sequences"
This reverts commit 79c8a6b which caused a serious regression in
streaming mode.

Also reverts part of commit 52ceced "Fix infinite loops with push
parser in recovery mode".

Fixes bug 786554.
2017-08-30 14:19:06 +02:00
Nick Wellnhofer
899a5d9f0e Detect infinite recursion in parameter entities
When expanding a parameter entity in a DTD, infinite recursion could
lead to an infinite loop or memory exhaustion.

Thanks to Wei Lei for the first of many reports.

Fixes bug 759579.
2017-07-25 15:21:12 +02:00
Nick Wellnhofer
5f440d8cad Rework entity boundary checks
Make sure to finish all entities in the internal subset. Nevertheless,
readd a sanity check in xmlParseStartTag2 that was lost in my previous
commit. Also add a sanity check in xmlPopInput. Popping an input
unexpectedly was the source of many recent memory bugs. The check
doesn't mitigate such issues but helps with diagnosis.

Always base entity boundary checks on the input ID, not the input
pointer. The pointer could have been reallocated to the old address.

Always throw a well-formedness error if a boundary check fails. In a
few places, a validity error was thrown.

Fix a few error codes and improve indentation.
2017-06-17 13:25:53 +02:00
David Kilzer
85c112a082 Add test cases for bug 758518
test/HTML/758518-entity.html exposed a bug in pushParseTest() in
runtest.c which assumed that an input file was at least 4 bytes long.
That test case is only 3 bytes, so we now take the minimum of 4 bytes
or the length of the test input.  We also now use 'chunkSize' in place
of the hard-coded value '1024' later in the function.
2017-06-12 18:26:11 +02:00
Nick Wellnhofer
79c8a6b105 Print error messages for truncated UTF-8 sequences
Before, truncated UTF-8 sequences at the end of a file were treated as
EOF. Create an error message containing the offending bytes.

xmlStringCurrentChar would also print characters from the input stream,
not the string it's working on.
2017-06-10 18:11:58 +02:00
Nick Wellnhofer
932cc9896a Fix buffer size checks in xmlSnprintfElementContent
xmlSnprintfElementContent failed to correctly check the available
buffer space in two locations.

Fixes bug 781333 (CVE-2017-9047) and bug 781701 (CVE-2017-9048).

Thanks to Marcel Böhme and Thuan Pham for the report.
2017-06-05 19:38:19 +02:00
Nick Wellnhofer
e26630548e Fix handling of parameter-entity references
There were two bugs where parameter-entity references could lead to an
unexpected change of the input buffer in xmlParseNameComplex and
xmlDictLookup being called with an invalid pointer.

Percent sign in DTD Names
=========================

The NEXTL macro used to call xmlParserHandlePEReference. When parsing
"complex" names inside the DTD, this could result in entity expansion
which created a new input buffer. The fix is to simply remove the call
to xmlParserHandlePEReference from the NEXTL macro. This is safe because
no users of the macro require expansion of parameter entities.

- xmlParseNameComplex
- xmlParseNCNameComplex
- xmlParseNmtoken

The percent sign is not allowed in names, which are grammatical tokens.

- xmlParseEntityValue

Parameter-entity references in entity values are expanded but this
happens in a separate step in this function.

- xmlParseSystemLiteral

Parameter-entity references are ignored in the system literal.

- xmlParseAttValueComplex
- xmlParseCharDataComplex
- xmlParseCommentComplex
- xmlParsePI
- xmlParseCDSect

Parameter-entity references are ignored outside the DTD.

- xmlLoadEntityContent

This function is only called from xmlStringLenDecodeEntities and
entities are replaced in a separate step immediately after the function
call.

This bug could also be triggered with an internal subset and double
entity expansion.

This fixes bug 766956 initially reported by Wei Lei and independently by
Chromium's ClusterFuzz, Hanno Böck, and Marco Grassi. Thanks to everyone
involved.

xmlParseNameComplex with XML_PARSE_OLD10
========================================

When parsing Names inside an expanded parameter entity with the
XML_PARSE_OLD10 option, xmlParseNameComplex would call xmlGROW via the
GROW macro if the input buffer was exhausted. At the end of the
parameter entity's replacement text, this function would then call
xmlPopInput which invalidated the input buffer.

There should be no need to invoke GROW in this situation because the
buffer is grown periodically every XML_PARSER_CHUNK_SIZE characters and,
at least for UTF-8, in xmlCurrentChar. This also matches the code path
executed when XML_PARSE_OLD10 is not set.

This fixes bugs 781205 (CVE-2017-9049) and 781361 (CVE-2017-9050).
Thanks to Marcel Böhme and Thuan Pham for the report.

Additional hardening
====================

A separate check was added in xmlParseNameComplex to validate the
buffer size.
2017-06-05 18:38:33 +02:00
Nick Wellnhofer
7482f41f61 Check for integer overflow in xmlXPathFormatNumber
Check for overflow before casting double to int.

Found with afl-fuzz and UBSan.
2017-06-01 22:00:19 +02:00
Nick Wellnhofer
f4029cd413 Check XPath exponents for overflow
Avoid undefined behavior and wrong results with huge exponents.

Found with afl-fuzz and UBSan.
2017-05-31 16:04:37 +02:00
Nick Wellnhofer
a58331a6ee Check for overflow in xmlXPathIsPositionalPredicate
Avoid undefined behavior when casting from double to int.

Found with afl-fuzz and UBSan.
2017-05-31 16:04:26 +02:00
Nick Wellnhofer
a851868a75 Parse small XPath numbers more accurately
Don't count leading zeros towards the fraction size limit. This allows
to parse numbers like

    0.0000000000000000000000000000000000000000000000000000000001

which is the only standard-conformant way to represent such numbers, as
scientific notation isn't allowed in XPath 1.0. (It is allowed in XPath
2.0 and in libxml2 as an extension, though.)

Overall accuracy is still bad, see bug 783238.
2017-05-31 15:46:29 +02:00
Nick Wellnhofer
4bebb030db Rework XPath rounding functions
Use the C library's floor and ceil functions. The old code was overly
complicated for no apparent reason and could result in undefined
behavior when handling NaNs (found with afl-fuzz and UBSan).

Fix wrong comment in xmlXPathRoundFunction. The implementation was
already following the spec and rounding half up.
2017-05-31 15:38:42 +02:00
Nick Wellnhofer
40f5852149 Fix axis traversal from attribute and namespace nodes
When traversing the "preceding" axis from an attribute node, we must
first go up to the attribute's containing element. Otherwise, text
children of other attributes could be returned. This made it possible
to hit a code path in xmlXPathNextAncestor which contained another bug:
The attribute node was initialized with the context node instead of the
current node. Normally, this code path is only hit via
xmlXPathNextAncestorOrSelf in which case the current and context node
are the same.

The combination of the two bugs could result in an infinite loop, found
with libFuzzer.

Traversing the "following" and the "preceding" axis from namespace nodes
should be handled similarly. This wasn't supported at all previously.
2017-05-31 14:57:46 +02:00
Nick Wellnhofer
9ab01a277d Fix XPointer paths beginning with range-to
The old code would invoke the broken xmlXPtrRangeToFunction. range-to
isn't really a function but a special kind of location step. Remove
this function and always handle range-to in the XPath code.

The old xmlXPtrRangeToFunction could also be abused to trigger a
use-after-free error with the potential for remote code execution.

Found with afl-fuzz.

Fixes CVE-2016-5131.
2016-10-12 13:12:18 +02:00
Nick Wellnhofer
d8083bf779 Fix NULL pointer deref in XPointer range-to
- Check for errors after evaluating first operand.
- Add sanity check for empty stack.

Found with afl-fuzz.
2016-06-25 14:24:51 +02:00
Pranjal Jumde
0bcd05c5cd Heap-based buffer overread in htmlCurrentChar
For https://bugzilla.gnome.org/show_bug.cgi?id=758606

* parserInternals.c:
(xmlNextChar): Add an test to catch other issues on ctxt->input
corruption proactively.
For non-UTF-8 charsets, xmlNextChar() failed to check for the end
of the input buffer and would continuing reading.  Fix this by
pulling out the check for the end of the input buffer into common
code, and return if we reach the end of the input buffer
prematurely.
* result/HTML/758606.html: Added.
* result/HTML/758606.html.err: Added.
* result/HTML/758606.html.sax: Added.
* result/HTML/758606_2.html: Added.
* result/HTML/758606_2.html.err: Added.
* result/HTML/758606_2.html.sax: Added.
* test/HTML/758606.html: Added test case.
* test/HTML/758606_2.html: Added test case.
2016-05-23 15:01:07 +08:00
David Kilzer
0090675905 Heap-based buffer-underreads due to xmlParseName
For https://bugzilla.gnome.org/show_bug.cgi?id=759573

* parser.c:
(xmlParseElementDecl): Return early on invalid input to fix
non-minimized test case (759573-2.xml).  Otherwise the parser
gets into a bad state in SKIP(3) at the end of the function.
(xmlParseConditionalSections): Halt parsing when hitting invalid
input that would otherwise caused xmlParserHandlePEReference()
to recurse unexpectedly.  This fixes the minimized test case
(759573.xml).

* result/errors/759573-2.xml: Add.
* result/errors/759573-2.xml.err: Add.
* result/errors/759573-2.xml.str: Add.
* result/errors/759573.xml: Add.
* result/errors/759573.xml.err: Add.
* result/errors/759573.xml.str: Add.
* test/errors/759573-2.xml: Add.
* test/errors/759573.xml: Add.
2016-05-23 15:01:07 +08:00
Pranjal Jumde
38eae57111 Heap use-after-free in xmlSAX2AttributeNs
For https://bugzilla.gnome.org/show_bug.cgi?id=759020

* parser.c:
(xmlParseStartTag2): Attribute strings are only valid if the
base does not change, so add another check where the base may
change.  Make sure to set 'attvalue' to NULL after freeing it.
* result/errors/759020.xml: Added.
* result/errors/759020.xml.err: Added.
* result/errors/759020.xml.str: Added.
* test/errors/759020.xml: Added test case.
2016-05-23 15:01:07 +08:00
Pranjal Jumde
45752d2c33 Bug 759398: Heap use-after-free in xmlDictComputeFastKey <https://bugzilla.gnome.org/show_bug.cgi?id=759398>
* parser.c:
(xmlParseNCNameComplex): Store start position instead of a
pointer to the name since the underlying buffer may change,
resulting in a stale pointer being used.
* result/errors/759398.xml: Added.
* result/errors/759398.xml.err: Added.
* result/errors/759398.xml.str: Added.
* test/errors/759398.xml: Added test case.
2016-05-23 15:01:07 +08:00