This is another attempt at fixing parser progress checks. Instead of
relying on in->consumed, which could overflow, change some content
parser functions to make guaranteed progress on certain byte sequences.
This is another attempt at fixing parser progress checks. Instead of
relying on in->consumed, which could overflow, make the attribute parser
functions return a NULL name only if they don't make progress.
This is another attempt at fixing parser progress checks. Instead of
relying on in->consumed, which could overflow, change some DTD parser
functions to make guaranteed progress on certain byte sequences.
In some cases, for example when using encoders, the read callback was
set to NULL, in other cases it was set to xmlInputReadCallbackNop.
xmlGROW only tested for xmlInputReadCallbackNop, resulting in errors
when parsing large encoded content from memory.
Always use a NULL callback for memory buffers to avoid ambiguities.
Fixes#262.
Also impose size limits when XML_PARSE_HUGE is set. Limit size of names
to XML_MAX_TEXT_LENGTH (10 million bytes) and other content to
XML_MAX_HUGE_LENGTH (1 billion bytes).
Move some the length checks to the end of the respective loop to make
them strict.
xmlParseEntityValue didn't have a length limitation at all. But without
XML_PARSE_HUGE, this should eventually trigger an error in xmlGROW.
Thanks to Maddie Stone working with Google Project Zero for the report!
Fix memory leak in case xmlParseAttValueInternal is called with a NULL
`len` a non-NULL `alloc` argument. This static function is never called
with such arguments internally, but the misleading code should be fixed
nevertheless.
Fixes#422.
Remove explicit integer casts as final operation
- in assignments
- when passing arguments
- when returning values
Remove casts
- to the same type
- from certain range-bound values
The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.
Document some explicit casts as truncating and add a few missing ones.
Private functions were previously declared
- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.
Consolidate all private header files in include/private.
xmlCtxtReadDoc used to create an input stream involving
xmlNewStringInputStream. This would create a stream without an input
buffer, causing problems with encodings (see #34).
After commit aab584dc3, an error was returned even with UTF-8 encodings
which happened to work before.
Make xmlCtxtReadDoc call xmlCtxtReadMemory which doesn't suffer from
these issues. Also fix htmlCtxtReadDoc.
Fixes#397.
* HTMLparser.c:
(htmlSkipBlankChars):
* parser.c:
(xmlSkipBlankChars):
- Cap the return value at INT_MAX.
- The commit range that OSS-Fuzz listed for the fix didn't make
any changes to xmlSkipBlankChars(), so it seems like this
issue may still exist.
Found by OSS-Fuzz Issue 44803.
Similar to 8f5710379, mark more static data structures with
`const` keyword.
Also fix placement of `const` in encoding.c.
Original patch by Sarah Wilkin.
Testing the current input pointer for modification is unreliable since
the input buffer could have been freed and realloced. Check whether the
input id and the up-to-date number of bytes consumed match.
There doesn't seem to be a good reason to abort in xmlParseReference
if a well-formedness error was detected. Removing this check allows to
parse entity references after an error in recovery mode.
Fixes#270.
In most places, we really need the double-it scheme to avoid quadratic
behavior. The hybrid scheme still can cause many reallocations and the
bounded scheme doesn't seem to provide meaningful protection in
xmlreader.c.
Before, we tried to reset the last error in xmlCleanupParser. But if
xmlCleanupParser wasn't called from the main thread, this would reset
the thread-local error object. xmlCleanupGlobals has access to the
error object of the main thread and can reset it reliably.
From what I can tell, some really early Cygwin versions from around
1998-2000 used to erroneously define _WIN32. This was eventually fixed,
but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is
unnecessary.
Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether
to use __declspec.