Throw an error if entity substitution was requested.
Now we only downgrade to a warning if
- XML_PARSE_DTDLOAD wasn't specified, and
- entity aren't substituted or XML_PARSE_NO_XXE was specified.
Should fix#724.
Always use XML_WAR_UNDECLARED_ENTITY with warning error level in
documents with external subset or parameter entities. Use
XML_ERR_UNDECLARED_ENTITY otherwise.
Fixes a regression from commit e0dd330b, resulting in duplicate
attributes in the predefined XML namespace not being detected or
extraneous default attributes being passed.
Fixes#704.
Don't use a separate function to handle "complex" attributes. Validate
UTF-8 byte sequences without decoding. This should improve performance
considerably when parsing multi-byte UTF-8 sequences.
Use a string buffer to avoid unnecessary allocations and copying when
expanding entities.
Normalize attribute values in a single pass while expanding entities.
Be more lenient in recovery mode.
If no entity substitution was requested, validate entities without
expanding. Fixes#596.
Also fixes#655.
This allows to report the reason why opening a file failed to the parser
context and improve error messages. Now we can also remove the stat call
before opening a file.
Fix places where malloc failures aren't reported.
Introduce new API function xmlAddEntity that returns separate error
codes.
Don't invoke global error handler for low-level errors which should be
handled by higher layers.
Invalid redelcaration warnings will be fixed later.
Partial revert of cb927e85 fixing CRLFs not incrementing the line
number.
This requires to rework xmlParseQNameHashed. The original implementation
prompted the change to xmlCurrentChar which really shouldn't modify the
'cur' pointer as side effect. But the NEXTL macro relies on this
behavior.
Ultimately, we should reintroduce the change to xmlCurrentChar and fix
the NEXTL macro. This will lead to single CRs incrementing the line
number as well which seems more consistent.
Fixes#628.
When decoding input data, check whether the "raw" buffer is empty after
parsing the document. Otherwise, the input ends with a truncated
multi-byte sequence which shouldn't be silently ignored.
If we try to continue parsing after an error in the internal or external
subset, entity expansion accounting gets more complicated. Simply halt
the parser.
Found with libFuzzer.
Don't set the "checked" flag when checking entities in default attribute
values. These entities could reference other entities which weren't
defined yet, so the check isn't reliable.
This fixes a short-lived regression which could lead to a call stack
overflow later in xmlStringGetNodeList.
Remove inaccurate xmlParseCheckTransition check.
Remove non-incremental xmlParseGetLasts check.
Add functions that check for several boundary constructs more
accurately, keeping track of progress in ctxt->checkIndex.
Fixes#439.
This is another attempt at fixing parser progress checks. Instead of
relying on in->consumed, which could overflow, change some DTD parser
functions to make guaranteed progress on certain byte sequences.
Downgrade the error message to a warning since the error was ignored,
anyway. Also print the name of redeclared entity. For a proper fix that
also shows filename and line number of the invalid redeclaration, we'd
have to
- pass the parser context to the entity functions somehow, or
- make these functions return distinct error codes.
Partial fix for #308.
Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was
removed in commit 62150ed2.
This commit also introduced a regression for direct users of
xmlParseContent. Unclosed tags weren't checked.
Commit 62150ed2 introduced a small regression in the error messages for
mismatched tags. This typically only affected messages after the first
mismatch, but with custom SAX handlers all line numbers would be off.
This also fixes line numbers in the SAX push parser which were never
handled correctly.
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.
Note that some test cases declared
<!ENTITY lt "<">
But the XML spec clearly states that this is illegal:
> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.
Also fixes#217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.
As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
This reverts commit 79c8a6b which caused a serious regression in
streaming mode.
Also reverts part of commit 52ceced "Fix infinite loops with push
parser in recovery mode".
Fixes bug 786554.
When expanding a parameter entity in a DTD, infinite recursion could
lead to an infinite loop or memory exhaustion.
Thanks to Wei Lei for the first of many reports.
Fixes bug 759579.
Now that replacement of parameter entities goes exclusively through
xmlSkipBlankChars, we can account for the surrounding space characters
there and remove the "blanks wrapper" hack.
Pop all extra input streams before resetting the input. Otherwise,
a call to xmlPopInput could make input available again.
Also set input->end to input->cur.
Changes the test output for some error tests. Unfortunately, some
fuzzed test cases were added to the test suite without manual cleanup.
This makes it almost impossible to review the impact of later changes
on the test output.