When an entity reference cycle is detected, the entity content is
cleared by setting its first byte to zero. But the entity content might
be allocated from a dict. In this case, the dict entry becomes corrupted
leading to all kinds of logic errors, including memory errors like
double-frees.
Stop storing entity content, orig, ExternalID and SystemID in a dict.
These values are unlikely to occur multiple times in a document, so they
shouldn't have been stored in a dict in the first place.
Thanks to Ned Williamson and Nathan Wachholz working with Google Project
Zero for the report!
Private functions were previously declared
- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.
Consolidate all private header files in include/private.
Downgrade the error message to a warning since the error was ignored,
anyway. Also print the name of redeclared entity. For a proper fix that
also shows filename and line number of the invalid redeclaration, we'd
have to
- pass the parser context to the entity functions somehow, or
- make these functions return distinct error codes.
Partial fix for #308.
Code is currently assuming UTF-8 without validating. Truncated UTF-8
input can cause out-of-bounds array access.
Adds further checks to partial fix in 50f06b3e.
Fixes#178
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.
Note that some test cases declared
<!ENTITY lt "<">
But the XML spec clearly states that this is illegal:
> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.
Also fixes#217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.
As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.
Fixes bug 784861.
First set of patches for zOS
- entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c:
ask conversion of code to ISO Latin 1 to avoid having the compiler assume
EBCDIC codepoint for characters.
- xmlmodule.c: make sure we have support for modules
- xmlIO.c: zOS path names are special avoid dsome of the expectstions from
Unix/Windows
For https://bugzilla.gnome.org/show_bug.cgi?id=689483
It seems there are functions that do use the const qualifier for some of the
arguments, but it seems that there are a lot of functions that don't use it and
probably should.
So I created a patch against 2.9.0 that makes as much as possible const in
tree.h, and changed other files as needed.
There were a lot of cases like "const xmlNodePtr node". This doesn't actually
do anything, there the *pointer* is constant not the object it points to. So I
changed those to "const xmlNode *node".
I also removed some consts, mostly in the Copy functions, because those
functions can actually modify the doc or node they copy from
Handle special cases of &{...} constructs as hinted in the spec
http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
* HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c
testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c:
fix unused variables, or unneeded increments as well as a couple
of space issues
* runtest.c: check for NULL before calling unlink()
* xmlreader.c: applied patch from Aswin to fix tree skipping
* include/libxml/entities.h entities.c: fixed a comment and
added a new xmlNewEntity() entry point
* runtest.c: be less verbose
* tree.c: space and tabs cleanups
daniel
svn path=/trunk/; revision=3774
* include/libxml/entities.h entities.c SAX2.c parser.c: rework
the patch to avoid some ABI issue with people allocating
entities structure directly
Daniel
svn path=/trunk/; revision=3773
* include/libxml/entities.h entities.c SAX2.c parser.c: trying to
fix entities behaviour when using SAX, had to extend entities
content and hack on the entities processing code, but that should
fix the long standing bug #159219
Daniel
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
format to cope with gcc4 change of aliasing allowed scopes, had
to add extra informations to doc/libxml2-api.xml to separate
the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
* hash.c include/libxml/hash.h: added xmlHashCreateDict where
the hash reuses the dictionnary for internal strings
* entities.c valid.c parser.c: reuse that new API, leads to a decent
speedup when parsing for example DocBook documents.
Daniel
* gentest.py testapi.c: autogenerate a minimal NULL value sequence
for unknown pointer types
* HTMLparser.c SAX2.c chvalid.c encoding.c entities.c parser.c
parserInternals.c relaxng.c valid.c xmlIO.c xmlreader.c
xmlsave.c xmlschemas.c xmlschemastypes.c xmlstring.c xpath.c
xpointer.c: This uncovered an impressive amount of entry points
not checking for NULL pointers when they ought to, closing all
the open gaps.
Daniel
* gentest.py testapi.c: fixed the way the generator works,
extended the testing, especially with more real trees and nodes.
* HTMLtree.c tree.c valid.c xinclude.c xmlIO.c xmlsave.c: a bunch
of real problems found and fixed.
* entities.c: fix error reporting to go through the new handlers
Daniel
* tree.c: avoid returning default namespace when searching
from an attribute
* entities.c xmlwriter.c: reverse xmlEncodeSpecialChars() behaviour
back to escaping " since the normal serialization routines do not
use it anymore, should close bug #134477 . Tried to make
the writer avoid it too but it didn't work.
Daniel
* entities.c: fixed#127877, never output " in element content
* result/isolat3 result/slashdot16.xml result/noent/isolat3
result/noent/slashdot16.xml result/valid/REC-xml-19980210.xml
result/valid/index.xml result/valid/xlink.xml: this changes the
output of a few tests
Daniel
* include/libxml/parserInternals.h HTMLparser.c HTMLtree.c
SAX2.c catalog.c debugXML.c entities.c parser.c relaxng.c
testSAX.c tree.c valid.c xmlschemas.c xmlschemastypes.c
xpath.c: Changed all (?) occurences where validation macros
(IS_xxx) had single-byte arguments to use IS_xxx_CH instead
(e.g. IS_BLANK changed to IS_BLANK_CH). This gets rid of
many warning messages on certain platforms, and also high-
lights places in the library which may need to be enhanced
for proper UTF8 handling.
* entities.c legacy.c parser.c: made the predefined entities
static predefined structures to avoid the work, memory and
hazards associated to initialization/cleanup.
Daniel
* configure.in entities.c tree.c valid.c xmllint.c
include/libxml/tree.h include/libxml/xmlversion.h.in:
Adding a configure option to remove tree manipulation
code which is not strictly needed by the parser.
Daniel
* Makefile.am: cleanup, creating a new legacy.c module,
made sure make tests ran in reduced conditions
* SAX.c SAX2.c configure.in entities.c globals.c parser.c
parserInternals.c tree.c valid.c xlink.c xmlIO.c xmlcatalog.c
xmlmemory.c xpath.c xmlmemory.c include/libxml/xmlversion.h.in:
increased the modularization, allow to configure out
validation code and legacy code, added a configuration
option --with-minimum compiling only the mandatory code
which then shrink to 200KB.
Daniel
* parser.c: fix a bug raised by the Mips compiler.
* include/libxml/SAX.h include/libxml/parser.h: move the
SAXv1 block definitions to parser.h fixes bug #123380
* xmlreader.c include/libxml/xmlreader.h: reinstanciate
the attribute and element pool borken 2 commits ago.
Start playing with an entry point to preserve a subtree.
* entities.c: remove a warning.
Daniel
* DOCBparser.c HTMLparser.c entities.c parser.c relaxng.c
xmlschemas.c xpath.c: removed some warnings by casting xmlChar
to unsigned int and a couple of others.
* xmlschemastypes.c: fixes a segfault on empty hexBinary strings
Daniel
* parser.c: another fix for nodeinfo in entities problem
* tree.c entities.c: fixed bug #106788 from James Clark
some spaces need to be serialized as character references.
Daniel