Most string functions can assume valid UTF-8. In order to detect malloc
failures reliably, xmlUTF8Strsub should only return NULL if the start
index is out of bounds or a memory allocation failed.
Introduce xmlStrVASPrintf, trying to handle buggy snprintf
implementations.
Introduce xmlSetError to set errors atomically.
Introduce xmlUpdateError to set an error, fixing up node, file and line.
Introduce helper function xmlRaiseMemoryError.
Make legacy error handlers call xmlReportError, avoiding checks in
xmlVRaiseError.
Remove fragile support for getting file and line info from XInclude
nodes.
Functions like xmlStrdup are called in the error handling code
(__xmlRaiseError) which can cause problems like use-after-free or
infinite loops when invoked recursively.
Calling xmlErrMemory without a context argument isn't helpful anyway.
Found with libFuzzer, see #344.
Private functions were previously declared
- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.
Consolidate all private header files in include/private.
For historical reasons, the string API operates with int indices which
can overflow, especially on 64-bit systems. libxml2 always made the
tacit assumption that strings will be never larger than INT_MAX bytes.
It should be considered a bug if any part of the code can produce
larger strings, whether they are externally visible or not.
Likewise, API users are expected not to supply strings larger than
INT_MAX bytes. This requirement isn't documented. But even if it was,
we must handle larger strings passed in by accident without causing
memory errors.
- xmlStrndup, xmlCharStrndup, xmlUTF8Strndup
Avoid integer overflow if len == INT_MAX.
- xmlStrlen, xmlUTF8Strsize, xmlUTF8Strloc
Avoid integer overflow by using size_t for index. If an input string
larger than INT_MAX bytes is detected, these functions now return 0
instead of a wrong and possibly negative value.
- xmlCheckUTF8
Avoid integer overflow by limiting index range.
- xmlStrncat, xmlStrncatNew, xmlEscapeFormatString
Avoid integer overflow. Return NULL instead of producing strings
larger than INT_MAX bytes.
With certain invalid UTF-8, xmlUTF8Strsize can read up to 6 bytes
beyond the end of the string and return the wrong size.
This means that in xmlUTF8Strndup and similar code, some content behind
the string is copied. But since the terminating \0 is copied as well,
this probably can't be exploited to leak sensitive information.
Found by afl-fuzz and ASan.
* xmlschemas.c xmlstring.c: Fixed a segfault during
text concatenation when validating a node tree:
xmlStrncat was called with a @len of -1; but unlike
xmlStrncatNew, it does not calculate the length
automatically in such a case (reported by Judy Hay
on the mailing list).
Updated the descriptions of the involved string
functions to note this.
* HTMLparser.c configure.in parserInternals.c runsuite.c runtest.c
testapi.c xmlschemas.c xmlschemastypes.c xmlstring.c: fixed a number
of warnings shown by HP-UX compiler and reported by Rick Jones
Daniel
* error.c globals.c parser.c runtest.c testHTML.c testSAX.c
threads.c valid.c xmllint.c xmlreader.c xmlschemas.c xmlstring.c
xmlwriter.c include/libxml/parser.h include/libxml/relaxng.h
include/libxml/valid.h include/libxml/xmlIO.h
include/libxml/xmlerror.h include/libxml/xmlexports.h
include/libxml/xmlschemas.h: applied a patch from Marcus Boerger
to fix problems with calling conventions on Windows this should
fix#309757
Daniel
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
format to cope with gcc4 change of aliasing allowed scopes, had
to add extra informations to doc/libxml2-api.xml to separate
the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
* parser.c: reset input->base within xmlStopParser
* xmlstring.c: removed call to xmlUTF8Strlen from within
xmlUTF8Strpos (Bill Moseley pointed out it was not
useful)
* gentest.py testapi.c: autogenerate a minimal NULL value sequence
for unknown pointer types
* HTMLparser.c SAX2.c chvalid.c encoding.c entities.c parser.c
parserInternals.c relaxng.c valid.c xmlIO.c xmlreader.c
xmlsave.c xmlschemas.c xmlschemastypes.c xmlstring.c xpath.c
xpointer.c: This uncovered an impressive amount of entry points
not checking for NULL pointers when they ought to, closing all
the open gaps.
Daniel
* xmlmemory.c include/libxml/xmlmemory.h: adding xmlMemBlocks()
* Makefile.am gentest.py testapi.c: work on generator of an
automatic API regression test tool.
* SAX2.c nanoftp.c parser.c parserInternals.c tree.c xmlIO.c
xmlstring.c: various API hardeing changes as a result of running
teh first set of automatic API regression tests.
* test/slashdot16.xml: apparently missing from CVS, commited it
Daniel
* catalog.c: added code to handle <group>, including dumping
to output (bug 151924).
* xmlcatalog.c, xmlstring.c, parser.c: minor compiler warning
cleanup (no change to logic)
* SAX2.c: fixed bug introduced during OOM fixup causing problems
with default namespace when a named prefix with the same href
was present (reported on the mailing list by Karl Eichwalder.
* xmlstring.c: modified xmlCheckUTF8 with suggested code from
Julius Mittenzwei.
* dict.c: added a typecast to try to avoid problem reported by
Pascal Rodes.
* tree.c: Dodji pointed out a bug in xmlGetNodePath()
* xmlcatalog.c: applied patch from Albert Chin to add a
--no-super-update option to xmlcatalog see #145461
and another patch also from Albert Chin to not crash
on -sgml --del without args see #145462
* Makefile.am: applied another patch from Albert Chin to
fix a problem with diff on Solaris #145511
* xmlstring.c: fix xmlCheckUTF8() according to the suggestion
in bug #148115
* python/libxml.py: apply fix from Marc-Antoine Parent about
the errors in libxml(2).py on the node wrapper #135547
Daniel