Reporting errors is expensive and some abusive test cases can generate
an error for each invalid input byte. This causes the parser to spend
most of the time with error handling. Limit the number of errors and
warnings to 100.
This commit implements robust detection of entity amplification attacks,
better known as the "billion laughs" attack.
We now limit the size of the document after substitution of entities to
10 times the size before expansion. This guarantees linear behavior by
definition. There already was a similar check before, but the accounting
of "sizeentities" (size of external entities) and "sizeentcopy" (size of
all copies created by entity references) wasn't accurate.
We also need saturation arithmetic since we're historically limited to
"unsigned long" which is 32-bit on many platforms.
A maximum of 10 MB of substitutions is always allowed. This should make
use cases like DITA work which have caused problems in the past.
The old checks based on the number of entities were removed. This is
accounted for by adding a fixed cost to each entity reference.
Entity amplification checks are now enabled even if XML_PARSE_HUGE is
set. This option is mainly used to allow larger text nodes. Most users
were unaware that it also disabled entity expansion checks.
Some of the limits might be adjusted later. If this change turns out to
affect legitimate use cases, we can add a separate parser option to
disable the checks.
Fixes#294.
Fixes#345.
Remove inaccurate xmlParseCheckTransition check.
Remove non-incremental xmlParseGetLasts check.
Add functions that check for several boundary constructs more
accurately, keeping track of progress in ctxt->checkIndex.
Fixes#439.
This code has been broken and deprecated since version 2.6.0, released
in 2003. Because of a bug in commit 961b535c, DOCBparser.c was never
compiled since 2012. I couldn't find a Debian package using any of its
symbols, so it seems safe to remove this module.
Make the parser context's "pushTab" point to an array of structs
instead of void pointers. This avoids casting unrelated types to void
pointers, improving readability and portability, and allows for more
efficient packing. Ultimately, the struct could be extended to include
the contents of "nameTab" and "spaceTab", further simplifying the code.
Historically, "pushTab" was only used by the push parser (hence the
name), so the change to the public headers should be safe.
Also remove an unused parameter from xmlParseEndTag2.
For https://bugzilla.gnome.org/show_bug.cgi?id=772726
* include/libxml/parser.h: Add a new parser flag XML_PARSE_NOXXE
* elfgcchack.h, xmlIO.h, xmlIO.c: associated loading routine
* include/libxml/xmlerror.h: new error raised
* xmllint.c: adds --noxxe flag to activate the option
If entities expansion in the XML parser is asked for,
it is possble to craft relatively small input document leading
to excessive on-the-fly content generation.
This patch accounts for those replacement and stop parsing
after a given threshold. it can be bypassed as usual with the
HUGE parser option.
Fix the lack of line number as reported by Johan Corveleyn <jcorvel@gmail.com>
* parser.c include/libxml/parser.h: add an XML_PARSE_BIG_LINES parser
option not switch on by default, it's an opt-in
* SAX2.c: if XML_PARSE_BIG_LINES is set store the long line numbers
in the psvi field of text nodes
* tree.c: expand xmlGetLineNo to extract those informations, also
make sure we can't fail on recursive behaviour
* error.c: in __xmlRaiseError, if a node is provided, call
xmlGetLineNo() if we can't get a valid line number.
* xmllint.c: switch on XML_PARSE_BIG_LINES in xmllint
For https://bugzilla.gnome.org/show_bug.cgi?id=643148
Reported by Bill Clarke <llib@computer.org>, it used a global variable
as a counter for the input id and this was not thread safe. To avoid the
race without adding unneeded locking in the parser path, move the id to
the parser context instead.
For both XML and HTML, the document can provide an encoding
either in XMLDecl in XML, or as a meta element in HTML head.
This adds options to ignore those encodings if the encoding
is known in advace for example if the content had been converted
before being passed to the parser.
* parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option
for XML parsing
* include/libxml/HTMLparser.h HTMLparser.c: adds the
HTML_PARSE_IGNORE_ENC for HTML parsing
* HTMLtree.c: fix the handling of saving when an unknown encoding is
defined in meta document header
* xmllint.c: add a --noenc option to activate the new parser options
* HTMLparser.c: new htmlParseElementInternal non recursive, with
htmlParseContentInternal and new function to handle node info
and element end.
* include/libxml/parser.h: add new stack for element info in parser
context
* parserInternals.c: fee element info stack
* HTMLparser.c: check when we see an head or a body tag and avoid
autogenerating them
* include/libxml/parser.h: the values for ctxt->html change depending
on the head or body tags being seen
* include/libxml/parser.h include/libxml/xmlwriter.h
include/libxml/relaxng.h include/libxml/xmlversion.h.in
include/libxml/xmlwin32version.h.in include/libxml/valid.h
include/libxml/xmlschemas.h include/libxml/xmlerror.h: change
ATTRIBUTE_PRINTF into LIBXML_ATTR_FORMAT to avoid macro name
collisions with other packages and headers as reported by
Belgabor and Mike Hommey
daniel
svn path=/trunk/; revision=3827
* include/libxml/parser.h include/libxml/xmlwriter.h
include/libxml/relaxng.h include/libxml/xmlversion.h.in
include/libxml/xmlwin32version.h.in include/libxml/valid.h
include/libxml/xmlschemas.h include/libxml/xmlerror.h:
port patch from Marcus Meissner to add gcc checking for
printf like functions parameters, should fix#65068
* doc/apibuild.py doc/*: modified the script accordingly
and regenerated
* xpath.c xmlmemory.c threads.c: fix a few warnings
Daniel
svn path=/trunk/; revision=3813
* parser.c include/libxml/parser.h: completely different fix for
the recursion detection based on entity density, big cleanups
in the entity parsing code too
* result/*.sax*: the parser should not ask for used defined versions
of the predefined entities
* testrecurse.c: automatic test for entity recursion checks
* Makefile.am: added testrecurse
* test/recurse/lol* test/recurse/good*: a first set of tests for
the recursion
Daniel
svn path=/trunk/; revision=3783
* include/libxml/parser.h parser.c xmllint.c: strengthen some
of the internal parser limits, add an XML_PARSE_HUGE option
to bypass them all. More internal parser limits will still need
to be added.
Daniel
svn path=/trunk/; revision=3777
* include/libxml/parser.h xinclude.c xmllint.c: patch based on
Wieant Nielander contribution to add the option of not doing
URI base fixup in XInclude
Daniel
svn path=/trunk/; revision=3775
* configure.in parser.c xmllint.c include/libxml/parser.h
include/libxml/xmlversion.h.in: applied patch from Andrew W. Nosenko
to expose if zlib support was compiled in, in the header, in the
feature API and in the xmllint --version output.
Daniel
* include/libxml/parser.h: Clarified in the docs that the tree
must not be tried to be modified if using the parser flag
XML_PARSE_COMPACT as suggested by Stefan Behnel
(#344390).
* include/libxml/parser.h parser.c xmllint.c: damn XML_FEATURE_UNICODE
clashes with Expat headers rename to XML_WITH_ to fix bug #316053.
* doc/Makefile.am: build devhelp before the examples.
* doc/*: regenerated the API
Daniel
* NEWS elfgcchack.h testapi.c doc/*: updated the docs and rebuild
releasing 2.6.21
* include/libxml/threads.h threads.c: removed xmlIsThreadsEnabled()
* threads.c include/libxml/threads.h xmllint.c: added the more
generic xmlHasFeature() as suggested by Bjorn Reese, xmllint uses it.
Daniel
* HTMLparser.c parser.c SAX2.c debugXML.c tree.c valid.c xmlreader.c
xmllint.c include/libxml/HTMLparser.h include/libxml/parser.h:
added a parser XML_PARSE_COMPACT option to allocate small
text nodes (less than 8 bytes on 32bits, less than 16bytes on 64bits)
directly within the node, various changes to cope with this.
* result/XPath/tests/* result/XPath/xptr/* result/xmlid/*: this
slightly change the output
Daniel
* error.c globals.c parser.c runtest.c testHTML.c testSAX.c
threads.c valid.c xmllint.c xmlreader.c xmlschemas.c xmlstring.c
xmlwriter.c include/libxml/parser.h include/libxml/relaxng.h
include/libxml/valid.h include/libxml/xmlIO.h
include/libxml/xmlerror.h include/libxml/xmlexports.h
include/libxml/xmlschemas.h: applied a patch from Marcus Boerger
to fix problems with calling conventions on Windows this should
fix#309757
Daniel
* testapi.c tree.c: fixing a leak detected by testapi in
xmlDOMWrapAdoptNode, and fixing another side effect in testapi
seems to pass tests fine now.
* include/libxml/parser.h parser.c: xmlStopParser() is no more limited
to push mode
* error.c: remove a warning
* runtest.c xmllint.c: avoid compilation errors if only some parts
of the library are compiled in.
Daniel
Synchronized the header files with the library code in order
to assure that all the various conditionals (LIBXML_xxxx_ENABLED)
were the same in both. Modified the API database content to more
accurately reflect the conditionals. Enhanced the generation
of that database. Although there was no substantial change to
any of the library code's logic, a large number of files were
modified to achieve the above, and the configuration script
was enhanced to do some automatic enabling of features (e.g.
--with-xinclude forces --with-xpath). Additionally, all the format
errors discovered by apibuild.py were corrected.
* configure.in: enhanced cross-checking of options
* doc/apibuild.py, doc/elfgcchack.xsl, doc/libxml2-refs.xml,
doc/libxml2-api.xml, gentest.py: changed the usage of the
<cond> element in module descriptions
* elfgcchack.h, testapi.c: regenerated with proper conditionals
* HTMLparser.c, SAX.c, globals.c, tree.c, xmlschemas.c, xpath.c,
testSAX.c: cleaned up conditionals
* include/libxml/[SAX.h, SAX2.h, debugXML.h, encoding.h, entities.h,
hash.h, parser.h, parserInternals.h, schemasInternals.h, tree.h,
valid.h, xlink.h, xmlIO.h, xmlautomata.h, xmlreader.h, xpath.h]:
synchronized the conditionals with the corresponding module code
* doc/examples/tree2.c, doc/examples/xpath1.c, doc/examples/xpath2.c:
added additional conditions required for compilation
* doc/*.html, doc/html/*.html: rebuilt the docs
* debugXML.c: added help for new set shell command
* xinclude.c xmllint.c xmlreader.c include/libxml/parser.h:
added parser option to not generate XInclude start/end nodes,
added a specific option to xmllint to test it fixes#130769
* Makefile.am: regression test the new feature
* doc/xmllint.1 doc/xmllint.xml: updated man page to document option.
Daniel
* xmlIO.c: small typo pointed out by Mike Hommey
* doc/xmllint.xml, xmllint.html, xmllint.1: slightly improved
the --c14n description, c.f. #144675 .
* nanohttp.c nanoftp.c: applied a first simple patch from
Mike Hommey for $no_proxy, c.f. #133470
* parserInternals.c include/libxml/parserInternals.h
include/libxml/xmlerror.h: cleanup to avoid 'error' identifier
in includes #
* parser.c SAX2.c debugXML.c include/libxml/parser.h:
first version of the inplementation of parsing within
the context of a node in the tree #142359, new function
xmlParseInNodeContext(), added support at the xmllint --shell
level as the "set" function
* test/scripts/set* result/scripts/* Makefile.am: extended
the script based regression tests to instrument the new function.
Daniel
* parser.c xmlreader.c include/libxml/parser.h: fixed a serious
problem when substituing entities using the Reader, the entities
content might be freed and if rereferenced would crash
* Makefile.am test/* result/*: added a new test case and a new
test operation for the reader with substitution of entities.
Daniel
* parserInternals.c xmlIO.c encoding.c include/libxml/parser.h
include/libxml/xmlIO.h: added xmlByteConsumed() interface
* doc/*: updated the benchmark rebuilt the docs
* python/tests/Makefile.am python/tests/indexes.py: added a
specific regression test for xmlByteConsumed()
* include/libxml/encoding.h rngparser.c tree.c: small cleanups
Daniel
* encoding.c, parser.c, xmlstring.c, Makefile.am,
include/libxml/Makefile.am, include/libxml/catalog.c,
include/libxml/chvalid.h, include/libxml/encoding.h,
include/libxml/parser.h, include/libxml/relaxng.h,
include/libxml/tree.h, include/libxml/xmlwriter.h,
include/libxml/xmlstring.h:
moved string and UTF8 routines out of parser.c and encoding.c
into a new module xmlstring.c with include file
include/libxml/xmlstring.h mostly using patches from Reid
Spencer. Since xmlChar now defined in xmlstring.h, several
include files needed to have a #include added for safety.
* doc/apibuild.py: added some additional sorting for various
references displayed in the APIxxx.html files. Rebuilt the
docs, and also added new file for xmlstring module.
* configure.in: small addition to help my testing; no effect on
normal usage.
* doc/search.php: added $_GET[query] so that persistent globals
can be disabled (for recent versions of PHP)