Make sure that references from IDs are updated.
Note that if there are IDs with the same value in a document, the last
one will now be returned. IDs should be unique, but maybe this should be
addressed.
Fix many places where malloc failures aren't reported.
Make some API function return an error code. Changing the return type
from void to int is technically an ABI break but should be safe on most
platforms.
- xmlNodeSetContent
- xmlNodeSetContentLen
- xmlNodeAddContent
- xmlNodeAddContentLen
- xmlNodeSetBase
Introduce new API functions that return a separate error code if a
memory allocation fails.
- xmlNodeGetAttrValue
- xmlNodeGetBaseSafe
- xmlGetNsListSafe
Introduce private functions xmlTreeEnsureXMLDecl and xmlSplitQName4.
Don't report low-level errors to the global error handler.
Fix tree
Introduce xmlGetNsListSafe
Fix tree
Entities which reference out-of-scope namespace have always been broken.
xmlParseBalancedChunkMemoryInternal tried to reuse the namespaces
currently in scope but these namespaces were ignored by the SAX handler.
Besides, there could be different namespaces in scope when expanding the
entity again. For example:
<!DOCTYPE doc [
<!ENTITY ent "<ns:elem/>">
]>
<doc>
<decl1 xmlns:ns="urn:ns1">
&ent;
</decl1>
<decl2 xmlns:ns="urn:ns2">
&ent;
</decl2>
</doc>
Add some comments outlining possible solutions to this problem.
For now, we stop copying namespaces to the temporary parser context
in xmlParseBalancedChunkMemoryInternal. This has never really worked
and the recent changes contained a partial fix which uncovered other
problems like a use-after-free with the XML Reader interface, found
by OSS-Fuzz.
To check whether an entity was already parsed, the code previously
tested whether "checked" was non-zero or "children" was non-null. The
"children" check could be unreliable because an empty entity also
results in an empty (NULL) node list. Use a separate flag to make this
check more reliable.
Remove explicit integer casts as final operation
- in assignments
- when passing arguments
- when returning values
Remove casts
- to the same type
- from certain range-bound values
The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.
Document some explicit casts as truncating and add a few missing ones.
Private functions were previously declared
- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.
Consolidate all private header files in include/private.
This is a follow-up to commit 6c283d83.
* buf.c:
(xmlBufGrowInternal):
- Call xmlBufMemoryError() when the buffer size would overflow.
- Account for NUL terminator byte when using XML_MAX_TEXT_LENGTH.
- Do not include NUL terminator byte when returning length.
(xmlBufAdd):
- Call xmlBufMemoryError() when the buffer size would overflow.
* tree.c:
(xmlBufferGrow):
- Call xmlTreeErrMemory() when the buffer size would overflow.
- Do not include NUL terminator byte when returning length.
(xmlBufferResize):
- Update error message in xmlTreeErrMemory() to be consistent
with other similar messages.
(xmlBufferAdd):
- Call xmlTreeErrMemory() when the buffer size would overflow.
(xmlBufferAddHead):
- Add overflow checks similar to those in xmlBufferAdd().
* buf.c:
(xmlBufAddLen):
- Change check for remaining space to account for the NUL
terminator. When adding a length exactly equal to the number
of unused bytes, a NUL terminator was not written.
(xmlBufResize):
- Set `buf->use` and NUL terminator when allocating a new
buffer.
* tree.c:
(xmlBufferResize):
- Set `buf->use` and NUL terminator when allocating a new
buffer.
(xmlBufferAddHead):
- Set NUL terminator before returning early when shifting
contents.
When changing `doc` on an xmlNodePtr or xmlAttrPtr, certain
fields must either be a free-standing string, or they must be
owned by `doc->dict`.
The code to make this change was simply missing, so the crash
happened when an xmlAttrPtr was being torn down after `doc`
changed from non-NULL to NULL, but the `name` field was not
copied. This is scenario 1 below.
The xmlNodePtr->name and xmlNodePtr->content fields are also
fixed at the same time. Note that xmlNodePtr->content is never
added to the dictionary, so NULL is used instead of `newDict` to
force a free-standing copy.
This change covers all cases of dictionary changes:
1. Owned by old dictionary -> NULL new dictionary
- Create free-standing copy of string.
2. Owned by old dictionary -> Non-NULL new dictionary
- Get string from new dictionary pool.
3. Not owned by old dictionary -> Non-NULL new dictionary
- No action necessary (already a free-standing string).
4. Not owned by old dictionary -> NULL new dictionary
- No action necessary (already a free-standing string).
* tree.c:
(_copyStringForNewDictIfNeeded): Add.
(xmlSetTreeDoc):
- Update xmlNodePtr->name, xmlNodePtr->content and
xmlAttrPtr->name when changing the document, if needed.
Found by OSS-Fuzz Issue 45132.
In several places, the code handling string buffers didn't check for
integer overflow or used wrong types for buffer sizes. This could
result in out-of-bounds writes or other memory errors when working on
large, multi-gigabyte buffers.
Thanks to Felix Wilhelm for the report.
Commit 7618a3b1 didn't account for coalesced text nodes.
I think it would be better if xmlStaticCopyNode didn't try to coalesce
text nodes at all. This code path can only be triggered if some other
code doesn't coalesce text nodes properly. In this case, OSS-Fuzz found
such behavior in xinclude.c.
In most places, we really need the double-it scheme to avoid quadratic
behavior. The hybrid scheme still can cause many reallocations and the
bounded scheme doesn't seem to provide meaningful protection in
xmlreader.c.
This code has been broken and deprecated since version 2.6.0, released
in 2003. Because of a bug in commit 961b535c, DOCBparser.c was never
compiled since 2012. I couldn't find a Debian package using any of its
symbols, so it seems safe to remove this module.