When creating an xmlTextReaderPtr using xmlReaderForMemory(),
there are two optional API functions that can be used:
- xmlTextReaderClose() may be called prior to calling
xmlFreeTextReader() to free parsing resources and close the
xmlTextReaderPtr without freeing it.
- xmlTextReaderCurrentDoc() may be called to return an
xmlDocPtr that's owned by the caller, and must be free using
xmlFreeDoc() after calling xmlFreeTextReader().
The use-after-free issues occur when calling
xmlTextReaderClose() before xmlFreeTextReader(), with different
issues occurring depending on whether xmlTextReaderCurrentDoc()
is also called.
* xmlreader.c:
(xmlFreeTextReader):
- Move code to xmlTextReaderClose(), remove duplicate code, and
call xmlTextReaderClose() if it hasn't been called yet.
(xmlTextReaderClose):
- Move call to xmlFreeNode(reader->faketext) from
xmlFreeTextReader() to fix a use-after-free bug when calling
xmlTextReaderClose() before xmlFreeTextReader(), but not when
using xmlTextReaderCurrentDoc(). The bug was introduced in
2002 by commit beb70bd39. In 2009 commit f4653dcd8 fixed the
use-after-free that occurred every time xmlFreeTextReader()
was called, but not the case where xmlTextReaderClose() was
called first.
- Move post-parsing validation code from xmlFreeTextReader() to
fix a second use-after-free when calling xmlTextReaderClose()
before xmlFreeTextReader(). This regressed in v2.9.10 with
commit 57a3af56f.
In most places, we really need the double-it scheme to avoid quadratic
behavior. The hybrid scheme still can cause many reallocations and the
bounded scheme doesn't seem to provide meaningful protection in
xmlreader.c.
This code has been broken and deprecated since version 2.6.0, released
in 2003. Because of a bug in commit 961b535c, DOCBparser.c was never
compiled since 2012. I couldn't find a Debian package using any of its
symbols, so it seems safe to remove this module.
Now that no references to ID and IDREF attributes are stored in
streaming validation mode, there's no need to try and remove them.
Also remove xmlTextReaderFreeIDTable which was identical to
xmlFreeIDTable.
Don't process XIncludes in the result of another inclusion to avoid
infinite recursion resulting in a call stack overflow.
This is something the XInclude engine shouldn't allow but correct
handling of intra-document includes would require major changes.
Found by OSS-Fuzz.
xml:id creates ID attributes even in documents without a DTD, so the
check in xmlTextReaderFreeProp must be changed to avoid use after free.
Found by OSS-Fuzz.
Keeping objects on a free list can hide memory errors. Only allow a
single node on free lists used by the XML reader when fuzzing. This
should hide fewer errors while still exercising the free list logic.
Just like IDs, IDREF attributes must be removed from the document's
refs table when they're freed by a reader. This bug is often hidden
because xmlAttr structs are reused and strings are stored in a
dictionary unless XML_PARSE_NODICT is specified.
Found by OSS-Fuzz.
The getEntity handler was already invoked by xmlParseReference, so it's
useless to call it again. After the recent change, xmlSAX2GetEntity
won't load any kind of entities anyway.
The parser context stores some options both in the "options" bits and
extra members like "validate" or "replaceEntities". Which of these
are actually read is inconsistent, so make sure to also update the
bit field.
Recent commit 1fbcf40 caused a use-after-free read because it didn't
account for the fact that xmlTextReaderFreeDoc frees entities before
freeing entity references via xmlTextReaderFreeNodeList.
Found by OSS-Fuzz.
Fix a regression caused by commit 39fbfb4f. If xmlTextReaderReadOuterXml
is called on a pristine xmlReader, the current node is NULL and must not
be dereferenced. Move the call to xmlTextReaderExpand to the start of
the function to make sure that we have a valid node.
Fixes#43.
In error cases, there might still be elements in the vstate table.
Since vstateVPop in valid.c is private, we have to pop the elements
with xmlValidatePopElement. This inspects nodes of the document, so
the reader doc must be freed after the clearing the vstate table.
Found by OSS-Fuzz.
Otherwise the encoding of the document is ignored and non-ASCII
characters are serialized as numeric references even if the encoding
is specified as UTF-8.
This fixes the traversal of parent nodes using xmlTextReaderNext()
when the reader is based on a preparsed document (created using
xmlReaderWalker(doc)).
Without this fix the parser will abort even though there are parent
nodes it should traverse to, if it is not currently on an element or
attribute node. This is incorrect, since it can be for example on a
text node when it needs to enter backtracking.
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.
Fixes bug 784861.
One of the operation on the reader could resolve entities
leading to the classic expansion issue. Make sure the
buffer used for xmlreader operation is bounded.
Introduce a new allocation type for the buffers for this effect.
Things now work correctly at the xmllint level:
thinkpad:~/XML -> xmllint --sax --noout --schema test_schema.xsd
test_xml.xml
test_xml.xml:72721: Schemas validity error : Element 'level1': Missing
child element(s). Expected is ( level2 ).
test_xml.xml fails to validate
thinkpad:~/XML -> xmllint --stream --schema test_schema.xsd test_xml.xml
test_xml.xml:72721: Schemas validity error : Element 'level1': Missing
child element(s). Expected is ( level2 ).
test_xml.xml fails to validate
thinkpad:~/XML ->
* error.c: fix a corner case of not reporting lines when we should
* include/libxml/xmlschemas.h doc/symbols.xml: had to add new entry
points to set the filename on a validation context and a locator
callback used to fetch the line and file from the context
* xmlschemas.c: add the new entry points xmlSchemaValidateSetFilename()
and xmlSchemaValidateSetLocator(), plus make sure the error reporting
routine gets the information if available. Add a locator for SAX.
* xmlreader.c: add and plug a locator for readers.
This was scattered in a number of modules, xmlParserInputPtr
have usually their base, cur and end pointer set from an
xmlBuf used as input.
* buf.c buf.h: add a new function implementing this setup
* parser.c HTMLparser.c catalog.c parserInternals.c xmlreader.c
use the new function instead of digging into the buffer in
all those modules
For https://bugzilla.gnome.org/show_bug.cgi?id=654567
I use xmlTextReader to parse failed that might be incomplete. These files are
the beginning of a well-formed file, but the end is missing so the file as a
whole is not well-formed.
The problem is that xmlTextReader starts returning errors when it encounters
the early EOF, even though I haven't finished reading all of the valid data in
the file. It would be helpful if xmlTextReader kept working until the very
end.