Fix many places where malloc failures aren't reported.
Make xmlErrMemory public. This is useful for custom external entity
loaders.
Introduce new API function xmlSwitchEncodingName.
Change the way how we store whether the the parser is stopped. This used
to be signaled by setting ctxt->instate to XML_PARSER_EOF which was
misdesigned and error-prone. Set ctxt->disableSAX to 2 instead and
introduce a macro PARSER_STOPPED. Also stop to remove parser inputs in
xmlHaltParser. This allows to remove many checks of ctxt->instate.
Introduce xmlErrParser to handle errors if a parser context is
available.
Fix many places where malloc failures aren't reported.
Rework XPath object cache to store free objects in a linked list to
avoid allocating an additional array. Remove some unneeded object pools.
Fix many places where malloc failures weren't reported, for example
after calling xmlStrdup.
Introduce new public API functions that return a separate error code if
a memory allocation fails:
- xmlParseURISafe
- xmlBuildURISafe
- xmlBuildRelativeURISafe
Update the fuzzer to check whether malloc failures are reported.
Don't create a copy of the whole input buffer. Read the data chunk by
chunk to save memory.
Historically, it was probably envisioned to read data from memory
without additional copying. This doesn't work reliably with the current
design of the XML parser which requires a terminating null byte at the
end of input buffers. This lead to xmlReadMemory interfaces, which
expect pointer and size arguments, being changed to make a
zero-terminated copy of the input buffer. Interfaces based on
xmlReadDoc, which actually expect a zero-terminated string and
would make zero-copy operation work, were then simplified to rely on
xmlReadMemoryi, resulting in an unnecessary copy.
To avoid copying (possibly gigabytes) of memory temporarily, we now
stream in-memory input just like content read from files in a
chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of
250 bytes). As a side effect, we also avoid another copy of the whole
input when handling non-UTF-8 data which was made possible by some
earlier commits.
Interfaces expecting zero-terminated strings now make use of strnlen
which unfortunately isn't part of the standard C library and only
mandated since POSIX 2008.
XIncludes involve XPath processing which can still lead to timeouts when
fuzzing. This will probably take a while to fix. The rest of the XML
parsing code should hopefully run without timeouts now. OSS-Fuzz only
shows a single timeout test case, so separate the XInclude from the core
XML fuzzer.
Now that entity expansion issues should be fixed, we should get more
interesting timeout errors from OSS-Fuzz. Disable XInclude for now,
since it often timeouts in XPath computations. The XInclude tests should
be moved to a separate fuzz target.
With very few exceptions, utilities and test programs don't require any
external libraries.
- xmllint and xmlcatalog need libreadline
- runtest and testThreads need pthreads
Brown paper bag release, some recently added sources were missing from
the 2.9.11 tarball:
- configure.ac: bump version
- fuzz/Makefile.am: add fuzz.h and seed/regexp to EXTRA_DIST
OSS-Fuzz has been fuzzing the HTML parser with inputs up to 1 MB for
several hundred hours without hitting the 20s timeout. It seems that
most timeouts resulting from accidentally quadratic behavior in the
HTML parser have been fixed. Start to gradually reduce the timeout to
find new performance issues.