Make sure that references from IDs are updated.
Note that if there are IDs with the same value in a document, the last
one will now be returned. IDs should be unique, but maybe this should be
addressed.
Fix places where malloc failures aren't reported.
Introduce new API function xmlAddEntity that returns separate error
codes.
Don't invoke global error handler for low-level errors which should be
handled by higher layers.
Invalid redelcaration warnings will be fixed later.
This commit implements robust detection of entity amplification attacks,
better known as the "billion laughs" attack.
We now limit the size of the document after substitution of entities to
10 times the size before expansion. This guarantees linear behavior by
definition. There already was a similar check before, but the accounting
of "sizeentities" (size of external entities) and "sizeentcopy" (size of
all copies created by entity references) wasn't accurate.
We also need saturation arithmetic since we're historically limited to
"unsigned long" which is 32-bit on many platforms.
A maximum of 10 MB of substitutions is always allowed. This should make
use cases like DITA work which have caused problems in the past.
The old checks based on the number of entities were removed. This is
accounted for by adding a fixed cost to each entity reference.
Entity amplification checks are now enabled even if XML_PARSE_HUGE is
set. This option is mainly used to allow larger text nodes. Most users
were unaware that it also disabled entity expansion checks.
Some of the limits might be adjusted later. If this change turns out to
affect legitimate use cases, we can add a separate parser option to
disable the checks.
Fixes#294.
Fixes#345.
When an entity reference cycle is detected, the entity content is
cleared by setting its first byte to zero. But the entity content might
be allocated from a dict. In this case, the dict entry becomes corrupted
leading to all kinds of logic errors, including memory errors like
double-frees.
Stop storing entity content, orig, ExternalID and SystemID in a dict.
These values are unlikely to occur multiple times in a document, so they
shouldn't have been stored in a dict in the first place.
Thanks to Ned Williamson and Nathan Wachholz working with Google Project
Zero for the report!
Private functions were previously declared
- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.
Consolidate all private header files in include/private.
Downgrade the error message to a warning since the error was ignored,
anyway. Also print the name of redeclared entity. For a proper fix that
also shows filename and line number of the invalid redeclaration, we'd
have to
- pass the parser context to the entity functions somehow, or
- make these functions return distinct error codes.
Partial fix for #308.
Code is currently assuming UTF-8 without validating. Truncated UTF-8
input can cause out-of-bounds array access.
Adds further checks to partial fix in 50f06b3e.
Fixes#178
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.
Note that some test cases declared
<!ENTITY lt "<">
But the XML spec clearly states that this is illegal:
> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.
Also fixes#217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.
As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.
Fixes bug 784861.
First set of patches for zOS
- entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c:
ask conversion of code to ISO Latin 1 to avoid having the compiler assume
EBCDIC codepoint for characters.
- xmlmodule.c: make sure we have support for modules
- xmlIO.c: zOS path names are special avoid dsome of the expectstions from
Unix/Windows
For https://bugzilla.gnome.org/show_bug.cgi?id=689483
It seems there are functions that do use the const qualifier for some of the
arguments, but it seems that there are a lot of functions that don't use it and
probably should.
So I created a patch against 2.9.0 that makes as much as possible const in
tree.h, and changed other files as needed.
There were a lot of cases like "const xmlNodePtr node". This doesn't actually
do anything, there the *pointer* is constant not the object it points to. So I
changed those to "const xmlNode *node".
I also removed some consts, mostly in the Copy functions, because those
functions can actually modify the doc or node they copy from
Handle special cases of &{...} constructs as hinted in the spec
http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
* HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c
testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c:
fix unused variables, or unneeded increments as well as a couple
of space issues
* runtest.c: check for NULL before calling unlink()
* xmlreader.c: applied patch from Aswin to fix tree skipping
* include/libxml/entities.h entities.c: fixed a comment and
added a new xmlNewEntity() entry point
* runtest.c: be less verbose
* tree.c: space and tabs cleanups
daniel
svn path=/trunk/; revision=3774
* include/libxml/entities.h entities.c SAX2.c parser.c: rework
the patch to avoid some ABI issue with people allocating
entities structure directly
Daniel
svn path=/trunk/; revision=3773
* include/libxml/entities.h entities.c SAX2.c parser.c: trying to
fix entities behaviour when using SAX, had to extend entities
content and hack on the entities processing code, but that should
fix the long standing bug #159219
Daniel
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
format to cope with gcc4 change of aliasing allowed scopes, had
to add extra informations to doc/libxml2-api.xml to separate
the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
* hash.c include/libxml/hash.h: added xmlHashCreateDict where
the hash reuses the dictionnary for internal strings
* entities.c valid.c parser.c: reuse that new API, leads to a decent
speedup when parsing for example DocBook documents.
Daniel
* gentest.py testapi.c: autogenerate a minimal NULL value sequence
for unknown pointer types
* HTMLparser.c SAX2.c chvalid.c encoding.c entities.c parser.c
parserInternals.c relaxng.c valid.c xmlIO.c xmlreader.c
xmlsave.c xmlschemas.c xmlschemastypes.c xmlstring.c xpath.c
xpointer.c: This uncovered an impressive amount of entry points
not checking for NULL pointers when they ought to, closing all
the open gaps.
Daniel
* gentest.py testapi.c: fixed the way the generator works,
extended the testing, especially with more real trees and nodes.
* HTMLtree.c tree.c valid.c xinclude.c xmlIO.c xmlsave.c: a bunch
of real problems found and fixed.
* entities.c: fix error reporting to go through the new handlers
Daniel
* tree.c: avoid returning default namespace when searching
from an attribute
* entities.c xmlwriter.c: reverse xmlEncodeSpecialChars() behaviour
back to escaping " since the normal serialization routines do not
use it anymore, should close bug #134477 . Tried to make
the writer avoid it too but it didn't work.
Daniel
* entities.c: fixed#127877, never output " in element content
* result/isolat3 result/slashdot16.xml result/noent/isolat3
result/noent/slashdot16.xml result/valid/REC-xml-19980210.xml
result/valid/index.xml result/valid/xlink.xml: this changes the
output of a few tests
Daniel
* include/libxml/parserInternals.h HTMLparser.c HTMLtree.c
SAX2.c catalog.c debugXML.c entities.c parser.c relaxng.c
testSAX.c tree.c valid.c xmlschemas.c xmlschemastypes.c
xpath.c: Changed all (?) occurences where validation macros
(IS_xxx) had single-byte arguments to use IS_xxx_CH instead
(e.g. IS_BLANK changed to IS_BLANK_CH). This gets rid of
many warning messages on certain platforms, and also high-
lights places in the library which may need to be enhanced
for proper UTF8 handling.