1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00
Commit Graph

118 Commits

Author SHA1 Message Date
Nick Wellnhofer
699299cae3 globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
Nick Wellnhofer
9d80a2b134 entities: Don't change doc when encoding entities
doc->encoding shouldn't be touched by xmlEncodeEntitiesInternal.
2023-08-17 12:47:14 +02:00
Nick Wellnhofer
ce76ebfd13 entities: Stop counting entities
This was only used in the old version of xmlParserEntityCheck.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
463bbeeca1 entities: Rework entity amplification checks
This commit implements robust detection of entity amplification attacks,
better known as the "billion laughs" attack.

We now limit the size of the document after substitution of entities to
10 times the size before expansion. This guarantees linear behavior by
definition. There already was a similar check before, but the accounting
of "sizeentities" (size of external entities) and "sizeentcopy" (size of
all copies created by entity references) wasn't accurate.

We also need saturation arithmetic since we're historically limited to
"unsigned long" which is 32-bit on many platforms.

A maximum of 10 MB of substitutions is always allowed. This should make
use cases like DITA work which have caused problems in the past.

The old checks based on the number of entities were removed. This is
accounted for by adding a fixed cost to each entity reference.

Entity amplification checks are now enabled even if XML_PARSE_HUGE is
set. This option is mainly used to allow larger text nodes. Most users
were unaware that it also disabled entity expansion checks.

Some of the limits might be adjusted later. If this change turns out to
affect legitimate use cases, we can add a separate parser option to
disable the checks.

Fixes #294.
Fixes #345.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
f34f184f8e entities: Add "flags" member to struct xmlEntity
This will hold various flags and eventually replace the "checked"
member.
2022-12-19 15:24:53 +01:00
Nick Wellnhofer
2059df5358 buf: Deprecate static/immutable buffers 2022-11-20 21:16:03 +01:00
Nick Wellnhofer
644a89e080 [CVE-2022-40304] Fix dict corruption caused by entity reference cycles
When an entity reference cycle is detected, the entity content is
cleared by setting its first byte to zero. But the entity content might
be allocated from a dict. In this case, the dict entry becomes corrupted
leading to all kinds of logic errors, including memory errors like
double-frees.

Stop storing entity content, orig, ExternalID and SystemID in a dict.
These values are unlikely to occur multiple times in a document, so they
shouldn't have been stored in a dict in the first place.

Thanks to Ned Williamson and Nathan Wachholz working with Google Project
Zero for the report!
2022-10-14 15:02:06 +02:00
Nick Wellnhofer
2cac626976 Don't use sizeof(xmlChar) or sizeof(char) 2022-09-01 03:35:19 +02:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
f550977295 Fix documentation in entities.c 2022-02-20 22:06:16 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
ce0871e15c Only warn on invalid redeclarations of predefined entities
Downgrade the error message to a warning since the error was ignored,
anyway. Also print the name of redeclared entity. For a proper fix that
also shows filename and line number of the invalid redeclaration, we'd
have to

- pass the parser context to the entity functions somehow, or
- make these functions return distinct error codes.

Partial fix for #308.
2022-02-20 21:49:04 +01:00
Joel Hockey
bf22713507 Validate UTF8 in xmlEncodeEntities
Code is currently assuming UTF-8 without validating. Truncated UTF-8
input can cause out-of-bounds array access.

Adds further checks to partial fix in 50f06b3e.

Fixes #178
2021-04-22 11:57:32 +02:00
Nick Wellnhofer
cbe1212db6 Fix null deref introduced with previous commit
Found by OSS-Fuzz.
2021-02-09 17:07:21 +01:00
Nick Wellnhofer
01411e7c5e Check for invalid redeclarations of predefined entities
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.

Note that some test cases declared

    <!ENTITY lt "<">

But the XML spec clearly states that this is illegal:

> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.

Also fixes #217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.

As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
2021-02-08 21:51:26 +01:00
Nick Wellnhofer
20c60886e4 Fix typos
Resolves #133.
2020-03-08 17:41:53 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
e03f0a199a Fix hash callback signatures
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.

Fixes bug 784861.
2017-11-09 16:42:47 +01:00
Stéphane Michaut
454e397eb7 Porting libxml2 on zOS encoding of code
First set of patches for zOS
- entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c:
  ask conversion of code to ISO Latin 1 to avoid having the compiler assume
  EBCDIC codepoint for characters.
- xmlmodule.c: make sure we have support for modules
- xmlIO.c: zOS path names are special avoid dsome of the expectstions from
  Unix/Windows
2017-08-28 14:30:43 +02:00
David Kilzer
4472c3a5a5 Fix some format string warnings with possible format string vulnerability
For https://bugzilla.gnome.org/show_bug.cgi?id=761029

Decorate every method in libxml2 with the appropriate
LIBXML_ATTR_FORMAT(fmt,args) macro and add some cleanups
following the reports.
2016-05-23 15:01:07 +08:00
Kurt Roeckx
95ebe53b50 Fix and add const qualifiers
For https://bugzilla.gnome.org/show_bug.cgi?id=689483

It seems there are functions that do use the const qualifier for some of the
arguments, but it seems that there are a lot of functions that don't use it and
probably should.

So I created a patch against 2.9.0 that makes as much as possible const in
tree.h, and changed other files as needed.

There were a lot of cases like "const xmlNodePtr node".  This doesn't actually
do anything, there the *pointer* is constant not the object it points to. So I
changed those to "const xmlNode *node".

I also removed some consts, mostly in the Copy functions, because those
functions can actually modify the doc or node they copy from
2014-10-13 16:06:21 +08:00
Daniel Veillard
0ab8ce5302 Switched comment in file to UTF-8 encoding 2013-03-30 22:33:05 +08:00
Daniel Veillard
7651606f31 Various cleanups to avoid compiler warnings 2012-09-11 14:02:08 +08:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
7d4c529a33 Improve HTML escaping of attribute on output
Handle special cases of &{...} constructs as hinted in the spec
  http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
2012-09-05 12:11:43 +08:00
Aron Xu
baaf03f80f Fix an error in previous commit 2012-07-20 15:41:34 +08:00
Daniel Veillard
4f9fdc709c Fix entities local buffers size problems 2012-07-18 17:54:05 +08:00
Daniel Veillard
13cee4e37b Fix a bunch of scan 'dead increments' and cleanup
* HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c
  testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c:
  fix unused variables, or unneeded increments as well as a couple
  of space issues
* runtest.c: check for NULL before calling unlink()
2009-09-05 14:52:55 +02:00
Daniel Veillard
aa6de47ebf applied patch from Aswin to fix tree skipping fixed a comment and added a
* xmlreader.c: applied patch from Aswin to fix tree skipping
* include/libxml/entities.h entities.c: fixed a comment and
  added a new xmlNewEntity() entry point
* runtest.c: be less verbose
* tree.c: space and tabs cleanups
daniel

svn path=/trunk/; revision=3774
2008-08-25 14:53:31 +00:00
Daniel Veillard
f4f4e4853a rework the patch to avoid some ABI issue with people allocating entities
* include/libxml/entities.h entities.c SAX2.c parser.c: rework
  the patch to avoid some ABI issue with people allocating
  entities structure directly
Daniel

svn path=/trunk/; revision=3773
2008-08-25 08:57:48 +00:00
Daniel Veillard
4bf899bf1b fix for CVE-2008-3281 Daniel
* include/libxml/parser.h include/libxml/entities.h entities.c
  parserInternals.c parser.c: fix for CVE-2008-3281
Daniel

svn path=/trunk/; revision=3772
2008-08-20 17:04:30 +00:00
Daniel Veillard
a37a6ad91a trying to fix entities behaviour when using SAX, had to extend entities
* include/libxml/entities.h entities.c SAX2.c parser.c: trying to
  fix entities behaviour when using SAX, had to extend entities
  content and hack on the entities processing code, but that should
  fix the long standing bug #159219
Daniel
2006-10-10 20:05:45 +00:00
Daniel Veillard
2728f845c5 more cleanups based on coverity reports. Daniel
* SAX2.c catalog.c encoding.c entities.c example/gjobread.c
  python/libxml.c: more cleanups based on coverity reports.
Daniel
2006-03-09 16:49:24 +00:00
Daniel Veillard
5d4644ef6e revamped the elfgcchack.h format to cope with gcc4 change of aliasing
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
  format to cope with gcc4 change of aliasing allowed scopes, had
  to add extra informations to doc/libxml2-api.xml to separate
  the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
  and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
2005-04-01 13:11:58 +00:00
Daniel Veillard
316a5c3989 added xmlHashCreateDict where the hash reuses the dictionnary for internal
* hash.c include/libxml/hash.h: added xmlHashCreateDict where
  the hash reuses the dictionnary for internal strings
* entities.c valid.c parser.c: reuse that new API, leads to a decent
  speedup when parsing for example DocBook documents.
Daniel
2005-01-23 22:56:39 +00:00
Daniel Veillard
7da92709c8 small speedup in skipping blanks characters interning the entities strings
* parser.c: small speedup in skipping blanks characters
* entities.c: interning the entities strings
Daniel
2005-01-23 20:15:53 +00:00
Daniel Veillard
ce682bc24b autogenerate a minimal NULL value sequence for unknown pointer types This
* gentest.py testapi.c: autogenerate a minimal NULL value sequence
  for unknown pointer types
* HTMLparser.c SAX2.c chvalid.c encoding.c entities.c parser.c
  parserInternals.c relaxng.c valid.c xmlIO.c xmlreader.c
  xmlsave.c xmlschemas.c xmlschemastypes.c xmlstring.c xpath.c
  xpointer.c: This uncovered an impressive amount of entry points
  not checking for NULL pointers when they ought to, closing all
  the open gaps.
Daniel
2004-11-05 17:22:25 +00:00
Daniel Veillard
8e725fb4de fixed a compilation problem on a recent change Daniel
* entities.c: fixed a compilation problem on a recent change
Daniel
2004-11-05 14:16:50 +00:00
Daniel Veillard
ce244ad595 fixed the way the generator works, extended the testing, especially with
* gentest.py testapi.c: fixed the way the generator works,
  extended the testing, especially with more real trees and nodes.
* HTMLtree.c tree.c valid.c xinclude.c xmlIO.c xmlsave.c: a bunch
  of real problems found and fixed.
* entities.c: fix error reporting to go through the new handlers
Daniel
2004-11-05 10:03:46 +00:00
Daniel Veillard
62040be360 avoid returning default namespace when searching from an attribute reverse
* tree.c: avoid returning default namespace when searching
  from an attribute
* entities.c xmlwriter.c: reverse xmlEncodeSpecialChars() behaviour
  back to escaping " since the normal serialization routines do not
  use it anymore, should close bug #134477 . Tried to make
  the writer avoid it too but it didn't work.
Daniel
2004-05-17 03:17:26 +00:00
Daniel Veillard
18ab8721ff fixed an XML entites content serialization potentially triggered by
* entities.c: fixed an XML entites content serialization
  potentially triggered by XInclude, see #126817
Daniel
2003-12-09 22:51:37 +00:00
Daniel Veillard
d45325589d fixed #127877, never output &quot; in element content this changes the
* entities.c: fixed #127877, never output &quot; in element content
* result/isolat3 result/slashdot16.xml result/noent/isolat3
  result/noent/slashdot16.xml result/valid/REC-xml-19980210.xml
  result/valid/index.xml result/valid/xlink.xml: this changes the
  output of a few tests
Daniel
2003-11-25 18:29:55 +00:00
William M. Brack
9e66059f08 fixed problem reported on the mailing list by Melvyn Sopacua - wrong
* entities.c, valid.c: fixed problem reported on the mailing
  list by Melvyn Sopacua - wrong argument order on functions
  called through xmlHashScan.
2003-10-20 14:56:06 +00:00
William M. Brack
76e95df055 Changed all (?) occurences where validation macros (IS_xxx) had
* include/libxml/parserInternals.h HTMLparser.c HTMLtree.c
  SAX2.c catalog.c debugXML.c entities.c parser.c relaxng.c
  testSAX.c tree.c valid.c xmlschemas.c xmlschemastypes.c
  xpath.c: Changed all (?) occurences where validation macros
  (IS_xxx) had single-byte arguments to use IS_xxx_CH instead
  (e.g. IS_BLANK changed to IS_BLANK_CH).  This gets rid of
  many warning messages on certain platforms, and also high-
  lights places in the library which may need to be enhanced
  for proper UTF8 handling.
2003-10-18 16:20:14 +00:00
Daniel Veillard
b2517d850d Fix error on output of high codepoint charref like &#x10FFFF; , reported
* entities.c: Fix error on output of high codepoint charref like
  &#x10FFFF; , reported by Eric Hanchrow
Daniel
2003-10-01 19:13:56 +00:00
Daniel Veillard
d3a2e4c2b3 made the predefined entities static predefined structures to avoid the
* entities.c legacy.c parser.c: made the predefined entities
  static predefined structures to avoid the work, memory and
  hazards associated to initialization/cleanup.
Daniel
2003-09-30 13:38:04 +00:00
Daniel Veillard
652327a727 Adding a configure option to remove tree manipulation code which is not
* configure.in entities.c tree.c valid.c xmllint.c
  include/libxml/tree.h include/libxml/xmlversion.h.in:
  Adding a configure option to remove tree manipulation
  code which is not strictly needed by the parser.
Daniel
2003-09-29 18:02:38 +00:00
Daniel Veillard
a9cce9cd0d Okay this is scary but it is just adding a configure option to disable
* HTMLtree.c SAX2.c c14n.c catalog.c configure.in debugXML.c
  encoding.c entities.c nanoftp.c nanohttp.c parser.c relaxng.c
  testAutomata.c testC14N.c testHTML.c testRegexp.c testRelax.c
  testSchemas.c testXPath.c threads.c tree.c valid.c xmlIO.c
  xmlcatalog.c xmllint.c xmlmemory.c xmlreader.c xmlschemas.c
  example/gjobread.c include/libxml/HTMLtree.h include/libxml/c14n.h
  include/libxml/catalog.h include/libxml/debugXML.h
  include/libxml/entities.h include/libxml/nanohttp.h
  include/libxml/relaxng.h include/libxml/tree.h
  include/libxml/valid.h include/libxml/xmlIO.h
  include/libxml/xmlschemas.h include/libxml/xmlversion.h.in
  include/libxml/xpathInternals.h python/libxml.c:
  Okay this is scary but it is just adding a configure option
  to disable output, this touches most of the files.
Daniel
2003-09-29 13:20:24 +00:00
Daniel Veillard
4432df239b cleanup, creating a new legacy.c module, made sure make tests ran in
* Makefile.am: cleanup, creating a new legacy.c module,
  made sure make tests ran in reduced conditions
* SAX.c SAX2.c configure.in entities.c globals.c parser.c
  parserInternals.c tree.c valid.c xlink.c xmlIO.c xmlcatalog.c
  xmlmemory.c xpath.c xmlmemory.c include/libxml/xmlversion.h.in:
  increased the modularization, allow to configure out
  validation code and legacy code, added a configuration
  option --with-minimum compiling only the mandatory code
  which then shrink to 200KB.
Daniel
2003-09-28 18:58:27 +00:00