1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00
Commit Graph

77 Commits

Author SHA1 Message Date
Nick Wellnhofer
05bd1720ce parser: Fix parsing of DTD content
Regressed in 2.11. Fixes #868.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
b4d3d87ed2 parser: Fix parsing of doctype declarations
Fix some long-standing issues.

Fixes #504.
2025-02-02 11:15:45 +01:00
Nick Wellnhofer
ffb058f484 parser: Fix detection of duplicate attributes
We really need a second scan if more than one namespace clash was
detected.
2024-10-28 20:26:55 +01:00
Nick Wellnhofer
bd9eed4694 parser: Make unsupported encodings an error in declarations
This was changed in 45157261, but in encoding declarations, unsupported
encodings should raise a fatal error.

Fixes #794.
2024-09-02 19:29:39 +02:00
Nick Wellnhofer
4fefba4cf6 parser: Rework handling of undeclared entities
Throw an error if entity substitution was requested.

Now we only downgrade to a warning if

- XML_PARSE_DTDLOAD wasn't specified, and
- entity aren't substituted or XML_PARSE_NO_XXE was specified.

Should fix #724.
2024-05-15 17:58:48 +02:00
Nick Wellnhofer
b717abdd09 parser: Consolidate error handling for undeclared entities
Always use XML_WAR_UNDECLARED_ENTITY with warning error level in
documents with external subset or parameter entities. Use
XML_ERR_UNDECLARED_ENTITY otherwise.
2024-04-23 18:36:15 +02:00
Nick Wellnhofer
186562a182 parser: Fix detection of duplicate attributes in XML namespace
Fixes a regression from commit e0dd330b, resulting in duplicate
attributes in the predefined XML namespace not being detected or
extraneous default attributes being passed.

Fixes #704.
2024-03-12 20:02:52 +01:00
Nick Wellnhofer
29beef653c parser: Pop inputs if parsing DTD failed
This should provide some statistics in ctxt->sizeentcopy even in the
error or recovery case.
2024-01-10 15:58:23 +01:00
Nick Wellnhofer
f237e5b934 parser: Avoid duplicate namespace errors
Don't report an extra attribute uniqueness error if a namespace is
undeclared. This matches old behavior.
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
d0eb5a7e54 parser: Remove xmlErrEncodingInt
Convert the last user to xmlFatalErr.
2024-01-04 15:28:57 +01:00
Nick Wellnhofer
e8fb3d639f parser: Convert some "internal errors" to meaningful codes 2024-01-02 19:48:23 +01:00
Nick Wellnhofer
37c6618be5 parser: Rework parsing of attribute and entity values
Don't use a separate function to handle "complex" attributes. Validate
UTF-8 byte sequences without decoding. This should improve performance
considerably when parsing multi-byte UTF-8 sequences.

Use a string buffer to avoid unnecessary allocations and copying when
expanding entities.

Normalize attribute values in a single pass while expanding entities.

Be more lenient in recovery mode.

If no entity substitution was requested, validate entities without
expanding. Fixes #596.

Also fixes #655.
2024-01-02 15:42:03 +01:00
Nick Wellnhofer
4ecc85d2cb parser: Push general entity input streams on the stack
This allows the error handler to give more context.
2023-12-29 01:20:08 +01:00
Nick Wellnhofer
f3fa34dcad parser: Fix general entity parsing
Clear namespace database.

Ignore non-fatal errors.
2023-12-28 16:47:41 +01:00
Nick Wellnhofer
ecfbcc8a52 parser: Rework general entity parsing
Don't create a new parser context but reuse the existing one.

This exposes bug #601 in a more obvious way.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
8d0aaf4b95 parser: Remove xmlErrEncoding
Use xmlFatalErr or xmlCtxtErrIO.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
7e511f35f1 io: Pass error codes from xmlFileOpenReal to xmlNewInputFromFile
This allows to report the reason why opening a file failed to the parser
context and improve error messages. Now we can also remove the stat call
before opening a file.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
e58ea29f17 SAX2: Report malloc failures
Fix many places where malloc failures aren't reported.

Improve error handling when parsing entity declarations.

Fixes #308.
2023-12-11 22:13:05 +01:00
Nick Wellnhofer
a1f7ecaef8 entities: Report malloc failures
Fix places where malloc failures aren't reported.

Introduce new API function xmlAddEntity that returns separate error
codes.

Don't invoke global error handler for low-level errors which should be
handled by higher layers.

Invalid redelcaration warnings will be fixed later.
2023-12-11 22:05:47 +01:00
Nick Wellnhofer
43b511fa71 parser: Make CRLF increment line number
Partial revert of cb927e85 fixing CRLFs not incrementing the line
number.

This requires to rework xmlParseQNameHashed. The original implementation
prompted the change to xmlCurrentChar which really shouldn't modify the
'cur' pointer as side effect. But the NEXTL macro relies on this
behavior.

Ultimately, we should reintroduce the change to xmlCurrentChar and fix
the NEXTL macro. This will lead to single CRs incrementing the line
number as well which seems more consistent.

Fixes #628.
2023-11-26 15:18:09 +01:00
Nick Wellnhofer
134d2ad890 parser: Protect against quadratic default attribute expansion 2023-10-06 12:47:24 +02:00
Nick Wellnhofer
e48f3d8e0a tests: Add more tests for redefined attributes 2023-09-29 12:43:08 +02:00
Nick Wellnhofer
53050b1dd8 parser: More fixes to push parser error handling 2023-08-29 20:06:43 +02:00
Nick Wellnhofer
bbd918b2e7 parser: Fix detection of null bytes
Also suppress misleading extra errors.

Fixes #122.
2023-08-29 18:43:10 +02:00
Nick Wellnhofer
c6083a32d6 parser: Improve error handling in push parser
- Report errors earlier
- Align error messages with pull parser
2023-08-29 18:41:05 +02:00
Nick Wellnhofer
855818bd2b parser: Check for truncated multi-byte sequences
When decoding input data, check whether the "raw" buffer is empty after
parsing the document. Otherwise, the input ends with a truncated
multi-byte sequence which shouldn't be silently ignored.
2023-08-08 15:21:37 +02:00
Nick Wellnhofer
74aa61e0bd parser: Halt parser on DTD errors
If we try to continue parsing after an error in the internal or external
subset, entity expansion accounting gets more complicated. Simply halt
the parser.

Found with libFuzzer.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
d320a683d1 parser: Fix entity check in attributes
Don't set the "checked" flag when checking entities in default attribute
values. These entities could reference other entities which weren't
defined yet, so the check isn't reliable.

This fixes a short-lived regression which could lead to a call stack
overflow later in xmlStringGetNodeList.
2023-01-17 13:59:24 +01:00
Nick Wellnhofer
a41b09c739 parser: Improve detection of entity loops
Set a flag to detect entity loops at once instead of processing until
the depth limit is exceeded.
2022-12-23 22:11:18 +01:00
Nick Wellnhofer
d972393f30 parser: Only report a single entity error
Don't report errors multiple times for nested entity references.
2022-12-23 22:10:39 +01:00
Nick Wellnhofer
76c6da4209 error: Make sure that error messages are valid UTF-8
This has caused issues with the Python bindings for a long time.

Should fix #64.
2022-12-04 23:34:19 +01:00
Nick Wellnhofer
68a6518c45 parser: Rewrite push parser boundary checks
Remove inaccurate xmlParseCheckTransition check.

Remove non-incremental xmlParseGetLasts check.

Add functions that check for several boundary constructs more
accurately, keeping track of progress in ctxt->checkIndex.

Fixes #439.
2022-11-20 21:27:08 +01:00
Nick Wellnhofer
f61b8a6233 parser: Fix DTD parser progress checks
This is another attempt at fixing parser progress checks. Instead of
relying on in->consumed, which could overflow, change some DTD parser
functions to make guaranteed progress on certain byte sequences.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
f1c32b4c78 Allow missing result files in runtest
Treat missing files as empty.
2022-04-04 04:28:15 +02:00
Nick Wellnhofer
ce0871e15c Only warn on invalid redeclarations of predefined entities
Downgrade the error message to a warning since the error was ignored,
anyway. Also print the name of redeclared entity. For a proper fix that
also shows filename and line number of the invalid redeclaration, we'd
have to

- pass the parser context to the entity functions somehow, or
- make these functions return distinct error codes.

Partial fix for #308.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
9edc20c154 Fix double counting of CRLF in comments
Fixes #151.
2022-02-07 20:54:07 +01:00
Nick Wellnhofer
de5b624f10 Fix handling of unexpected EOF in xmlParseContent
Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was
removed in commit 62150ed2.

This commit also introduced a regression for direct users of
xmlParseContent. Unclosed tags weren't checked.
2021-05-08 20:47:36 +02:00
Nick Wellnhofer
3e80560d4b Fix line numbers in error messages for mismatched tags
Commit 62150ed2 introduced a small regression in the error messages for
mismatched tags. This typically only affected messages after the first
mismatch, but with custom SAX handlers all line numbers would be off.

This also fixes line numbers in the SAX push parser which were never
handled correctly.
2021-05-07 11:48:11 +02:00
Nick Wellnhofer
01411e7c5e Check for invalid redeclarations of predefined entities
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.

Note that some test cases declared

    <!ENTITY lt "<">

But the XML spec clearly states that this is illegal:

> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.

Also fixes #217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.

As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
2021-02-08 21:51:26 +01:00
Nick Wellnhofer
79301d3d5e Fix timeout when handling recursive entities
Abort parsing early to avoid an almost infinite loop in certain error
cases involving recursive entities.

Found with libFuzzer.
2020-12-18 14:13:46 +01:00
Nick Wellnhofer
32cb5dccda Add test case for recursive external parsed entities 2020-02-11 17:36:43 +01:00
Nick Wellnhofer
f20daa9e51 Enable error tests with entity substitution 2020-02-11 17:36:43 +01:00
Nick Wellnhofer
c2f209c09f Disallow conditional sections in internal subset
Conditional sections are only allowed in *external* parameter entities
referenced from the internal subset.
2019-09-30 15:47:30 +02:00
Nick Wellnhofer
c51e38cb3a Make xmlParseConditionalSections non-recursive
Avoid call stack overflow in deeply nested conditional sections.

Found by OSS-Fuzz.
2019-09-30 15:47:30 +02:00
Nick Wellnhofer
62150ed2ab Make xmlParseContent and xmlParseElement non-recursive
Split xmlParseElement into subfunctions. Use nameNsPush to store prefix,
URI and nsNr on the heap, similar to the push parser.

Closes #84.
2019-09-23 17:45:50 +02:00
Nick Wellnhofer
f9fce96313 Fix unsigned integer overflow
It's defined behavior but -fsanitize=unsigned-integer-overflow is
useful to discover bugs.
2019-05-20 13:38:22 +02:00
Nick Wellnhofer
123234f2cf Free input buffer in xmlHaltParser
This avoids miscalculation of available bytes.

Thanks to Yunho Kim for the report.

Closes: #26
2018-09-11 15:06:17 +02:00
Nick Wellnhofer
69936b129f Revert "Print error messages for truncated UTF-8 sequences"
This reverts commit 79c8a6b which caused a serious regression in
streaming mode.

Also reverts part of commit 52ceced "Fix infinite loops with push
parser in recovery mode".

Fixes bug 786554.
2017-08-30 14:19:06 +02:00
Nick Wellnhofer
899a5d9f0e Detect infinite recursion in parameter entities
When expanding a parameter entity in a DTD, infinite recursion could
lead to an infinite loop or memory exhaustion.

Thanks to Wei Lei for the first of many reports.

Fixes bug 759579.
2017-07-25 15:21:12 +02:00
Nick Wellnhofer
872fea9485 Get rid of "blanks wrapper" for parameter entities
Now that replacement of parameter entities goes exclusively through
xmlSkipBlankChars, we can account for the surrounding space characters
there and remove the "blanks wrapper" hack.
2017-06-20 13:19:47 +02:00