libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	59b3366178	error: Limit number of parser errors Reporting errors is expensive and some abusive test cases can generate an error for each invalid input byte. This causes the parser to spend most of the time with error handling. Limit the number of errors and warnings to 100.	2022-12-27 14:41:19 +01:00
Nick Wellnhofer	ce76ebfd13	entities: Stop counting entities This was only used in the old version of xmlParserEntityCheck.	2022-12-21 20:19:10 +01:00
Nick Wellnhofer	463bbeeca1	entities: Rework entity amplification checks This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.	2022-12-21 20:19:10 +01:00
Nick Wellnhofer	ce9baf94d5	Remove XMLCALL and XMLCDECL macros from public headers	2022-12-08 02:48:27 +01:00
Nick Wellnhofer	68a6518c45	parser: Rewrite push parser boundary checks Remove inaccurate xmlParseCheckTransition check. Remove non-incremental xmlParseGetLasts check. Add functions that check for several boundary constructs more accurately, keeping track of progress in ctxt->checkIndex. Fixes #439.	2022-11-20 21:27:08 +01:00
Nick Wellnhofer	65dc8a63ac	Make xmlNewSAXParserCtx take a const sax handler Also improve documentation.	2022-09-01 00:17:45 +02:00
Nick Wellnhofer	51035c539e	Generate deprecation warnings for old SAX API	2022-08-25 20:17:03 +02:00
Nick Wellnhofer	9a82b94a94	Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt Add API functions to create a parser context with a custom SAX handler without having to mess with ctxt->sax manually.	2022-08-24 14:07:55 +02:00
Nick Wellnhofer	4a8c71eb7c	Remove DOCBparser This code has been broken and deprecated since version 2.6.0, released in 2003. Because of a bug in commit `961b535c`, DOCBparser.c was never compiled since 2012. I couldn't find a Debian package using any of its symbols, so it seems safe to remove this module.	2022-03-04 22:56:21 +01:00
Nick Wellnhofer	ebb1797030	Remove unneeded #includes	2022-03-04 22:11:49 +01:00
Nick Wellnhofer	cf4893f7b3	Deprecate legacy functions	2022-02-20 21:49:04 +01:00
Nick Wellnhofer	ce00c36e65	Store per-element parser state in a struct Make the parser context's "pushTab" point to an array of structs instead of void pointers. This avoids casting unrelated types to void pointers, improving readability and portability, and allows for more efficient packing. Ultimately, the struct could be extended to include the contents of "nameTab" and "spaceTab", further simplifying the code. Historically, "pushTab" was only used by the push parser (hence the name), so the change to the public headers should be safe. Also remove an unused parameter from xmlParseEndTag2.	2021-05-08 22:16:49 +02:00
Nick Wellnhofer	438e595a8c	Stop counting nbChars in parser context The value was inaccurate and never used.	2020-08-09 15:01:45 +02:00
Nick Wellnhofer	20c60886e4	Fix typos Resolves #133.	2020-03-08 17:41:53 +01:00
Jared Yanovich	2a350ee9b4	Large batch of typo fixes Closes #109.	2019-09-30 18:04:38 +02:00
Nick Wellnhofer	030b1f7a27	Revert "Add an XML_PARSE_NOXXE flag to block all entities loading even local" This reverts commit `2304078555`. The new flag doesn't work and the change even broke the XML_PARSE_NONET option.	2017-06-06 15:53:42 +02:00
Doran Moppert	2304078555	Add an XML_PARSE_NOXXE flag to block all entities loading even local For https://bugzilla.gnome.org/show_bug.cgi?id=772726 * include/libxml/parser.h: Add a new parser flag XML_PARSE_NOXXE * elfgcchack.h, xmlIO.h, xmlIO.c: associated loading routine * include/libxml/xmlerror.h: new error raised * xmllint.c: adds --noxxe flag to activate the option	2017-04-07 16:55:05 +02:00
Jan Pokorný	bb654feb9a	Fix typos: dictio{ nn -> n }ar{y,ies} Signed-off-by: Jan Pokorný <jpokorny@redhat.com>	2016-04-15 22:22:48 +08:00
Daniel Veillard	23f05e0c33	Detect excessive entities expansion upon replacement If entities expansion in the XML parser is asked for, it is possble to craft relatively small input document leading to excessive on-the-fly content generation. This patch accounts for those replacement and stop parsing after a given threshold. it can be bypassed as usual with the HUGE parser option.	2013-02-19 10:21:49 +08:00
Daniel Veillard	f8e3db0445	Big space and tab cleanup Remove all space before tabs and space and tabs at end of lines.	2012-09-11 13:26:36 +08:00
Daniel Veillard	968a03a2e5	Add support for big line numbers in error reporting Fix the lack of line number as reported by Johan Corveleyn <jcorvel@gmail.com> * parser.c include/libxml/parser.h: add an XML_PARSE_BIG_LINES parser option not switch on by default, it's an opt-in * SAX2.c: if XML_PARSE_BIG_LINES is set store the long line numbers in the psvi field of text nodes * tree.c: expand xmlGetLineNo to extract those informations, also make sure we can't fail on recursive behaviour * error.c: in __xmlRaiseError, if a node is provided, call xmlGetLineNo() if we can't get a valid line number. * xmllint.c: switch on XML_PARSE_BIG_LINES in xmllint	2012-08-13 12:41:33 +08:00
Daniel Veillard	0d51cfebc9	Fix a race in xmlNewInputStream For https://bugzilla.gnome.org/show_bug.cgi?id=643148 Reported by Bill Clarke <llib@computer.org>, it used a global variable as a counter for the input id and this was not thread safe. To avoid the race without adding unneeded locking in the parser path, move the id to the parser context instead.	2012-05-15 11:18:40 +08:00
Anders F Bjorklund	eae5261779	add lzma compression support	2012-01-27 22:19:52 +08:00
Daniel Veillard	c62efc847c	Add options to ignore the internal encoding For both XML and HTML, the document can provide an encoding either in XMLDecl in XML, or as a meta element in HTML head. This adds options to ignore those encodings if the encoding is known in advace for example if the content had been converted before being passed to the parser. * parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option for XML parsing * include/libxml/HTMLparser.h HTMLparser.c: adds the HTML_PARSE_IGNORE_ENC for HTML parsing * HTMLtree.c: fix the handling of saving when an unknown encoding is defined in meta document header * xmllint.c: add a --noenc option to activate the new parser options	2011-05-26 11:47:37 +08:00
Giuseppe Iuculano	48f7dcb724	480323 add code to plug in ICU converters by default This is not configured in by default but after some serious massaging incorporate that patch from Chromium/Chrome.	2010-11-04 17:42:42 +01:00
Eugene Pimenov	615904f582	Switch the HTML parser to be non-recursive * HTMLparser.c: new htmlParseElementInternal non recursive, with htmlParseContentInternal and new function to handle node info and element end. * include/libxml/parser.h: add new stack for element info in parser context * parserInternals.c: fee element info stack	2010-03-15 15:16:02 +01:00
Daniel Veillard	029a04d265	541335 HTML avoid creating 2 head or 2 body element * HTMLparser.c: check when we see an head or a body tag and avoid autogenerating them * include/libxml/parser.h: the values for ctxt->html change depending on the head or body tags being seen	2009-08-24 12:50:23 +02:00
Daniel Veillard	f39eafaa90	Make xmlRecoverDoc const (Martin Trappel) * include/libxml/parser.h parser.c: just make the parameter a const	2009-08-20 19:15:08 +02:00
Daniel Veillard	f076f348c4	change ATTRIBUTE_PRINTF into LIBXML_ATTR_FORMAT to avoid macro name * include/libxml/parser.h include/libxml/xmlwriter.h include/libxml/relaxng.h include/libxml/xmlversion.h.in include/libxml/xmlwin32version.h.in include/libxml/valid.h include/libxml/xmlschemas.h include/libxml/xmlerror.h: change ATTRIBUTE_PRINTF into LIBXML_ATTR_FORMAT to avoid macro name collisions with other packages and headers as reported by Belgabor and Mike Hommey daniel svn path=/trunk/; revision=3827	2009-04-15 09:20:25 +00:00
Daniel Veillard	f63085de5e	port patch from Marcus Meissner to add gcc checking for printf like * include/libxml/parser.h include/libxml/xmlwriter.h include/libxml/relaxng.h include/libxml/xmlversion.h.in include/libxml/xmlwin32version.h.in include/libxml/valid.h include/libxml/xmlschemas.h include/libxml/xmlerror.h: port patch from Marcus Meissner to add gcc checking for printf like functions parameters, should fix #65068 * doc/apibuild.py doc/: modified the script accordingly and regenerated xpath.c xmlmemory.c threads.c: fix a few warnings Daniel svn path=/trunk/; revision=3813	2009-01-18 20:53:59 +00:00
Rob Richards	b9ed017d31	add XML_PARSE_OLDSAX parser option to enable pre 2.7 SAX behavior. * include/libxml/parser.h parser.c: add XML_PARSE_OLDSAX parser option to enable pre 2.7 SAX behavior. svn path=/trunk/; revision=3807	2009-01-05 17:28:50 +00:00
Daniel Veillard	0161e638c6	completely different fix for the recursion detection based on entity * parser.c include/libxml/parser.h: completely different fix for the recursion detection based on entity density, big cleanups in the entity parsing code too * result/.sax: the parser should not ask for used defined versions of the predefined entities * testrecurse.c: automatic test for entity recursion checks * Makefile.am: added testrecurse * test/recurse/lol* test/recurse/good*: a first set of tests for the recursion Daniel svn path=/trunk/; revision=3783	2008-08-28 15:36:32 +00:00
Daniel Veillard	8915c150b5	strengthen some of the internal parser limits, add an XML_PARSE_HUGE * include/libxml/parser.h parser.c xmllint.c: strengthen some of the internal parser limits, add an XML_PARSE_HUGE option to bypass them all. More internal parser limits will still need to be added. Daniel svn path=/trunk/; revision=3777	2008-08-26 13:05:34 +00:00
Daniel Veillard	54bd29b79b	patch based on Wieant Nielander contribution to add the option of not * include/libxml/parser.h xinclude.c xmllint.c: patch based on Wieant Nielander contribution to add the option of not doing URI base fixup in XInclude Daniel svn path=/trunk/; revision=3775	2008-08-26 07:26:55 +00:00
Daniel Veillard	4bf899bf1b	fix for CVE-2008-3281 Daniel * include/libxml/parser.h include/libxml/entities.h entities.c parserInternals.c parser.c: fix for CVE-2008-3281 Daniel svn path=/trunk/; revision=3772	2008-08-20 17:04:30 +00:00
Daniel Veillard	34e3f64191	implement XML-1.0 5th edition, add parser option XML_PARSE_OLD10 to stick * include/libxml/parser.h include/libxml/xmlerror.h parser.c: implement XML-1.0 5th edition, add parser option XML_PARSE_OLD10 to stick to old behaviour * testapi.c gentest.py: modified slightly and regenerated * Makefile.am: add testchar Daniel svn path=/trunk/; revision=3755	2008-07-29 09:02:27 +00:00
Daniel Veillard	75acfeea32	applied patch from Andrew W. Nosenko to expose if zlib support was * configure.in parser.c xmllint.c include/libxml/parser.h include/libxml/xmlversion.h.in: applied patch from Andrew W. Nosenko to expose if zlib support was compiled in, in the header, in the feature API and in the xmllint --version output. Daniel	2006-07-13 06:29:56 +00:00
Kasimier T. Buchcik	803e37ac2c	Clarified in the docs that the tree must not be tried to be modified if * include/libxml/parser.h: Clarified in the docs that the tree must not be tried to be modified if using the parser flag XML_PARSE_COMPACT as suggested by Stefan Behnel (#344390).	2006-06-09 19:46:46 +00:00
Daniel Veillard	602434dee5	damn XML_FEATURE_UNICODE clashes with Expat headers rename to XML_WITH_ to * include/libxml/parser.h parser.c xmllint.c: damn XML_FEATURE_UNICODE clashes with Expat headers rename to XML_WITH_ to fix bug #316053. * doc/Makefile.am: build devhelp before the examples. * doc/*: regenerated the API Daniel	2005-09-12 09:20:31 +00:00
Daniel Veillard	0bcc7f6ae9	updated the docs and rebuild releasing 2.6.21 removed * NEWS elfgcchack.h testapi.c doc/: updated the docs and rebuild releasing 2.6.21 include/libxml/threads.h threads.c: removed xmlIsThreadsEnabled() * threads.c include/libxml/threads.h xmllint.c: added the more generic xmlHasFeature() as suggested by Bjorn Reese, xmllint uses it. Daniel	2005-09-04 21:39:03 +00:00
Daniel Veillard	8874b94cd2	added a parser XML_PARSE_COMPACT option to allocate small text nodes (less * HTMLparser.c parser.c SAX2.c debugXML.c tree.c valid.c xmlreader.c xmllint.c include/libxml/HTMLparser.h include/libxml/parser.h: added a parser XML_PARSE_COMPACT option to allocate small text nodes (less than 8 bytes on 32bits, less than 16bytes on 64bits) directly within the node, various changes to cope with this. * result/XPath/tests/* result/XPath/xptr/* result/xmlid/*: this slightly change the output Daniel	2005-08-25 13:19:21 +00:00
Daniel Veillard	ffa3c74933	applied a patch from Marcus Boerger to fix problems with calling * error.c globals.c parser.c runtest.c testHTML.c testSAX.c threads.c valid.c xmllint.c xmlreader.c xmlschemas.c xmlstring.c xmlwriter.c include/libxml/parser.h include/libxml/relaxng.h include/libxml/valid.h include/libxml/xmlIO.h include/libxml/xmlerror.h include/libxml/xmlexports.h include/libxml/xmlschemas.h: applied a patch from Marcus Boerger to fix problems with calling conventions on Windows this should fix #309757 Daniel	2005-07-21 13:24:09 +00:00
Daniel Veillard	39e5c89016	fixing a leak detected by testapi in xmlDOMWrapAdoptNode, and fixing * testapi.c tree.c: fixing a leak detected by testapi in xmlDOMWrapAdoptNode, and fixing another side effect in testapi seems to pass tests fine now. * include/libxml/parser.h parser.c: xmlStopParser() is no more limited to push mode * error.c: remove a warning * runtest.c xmllint.c: avoid compilation errors if only some parts of the library are compiled in. Daniel	2005-07-03 22:48:50 +00:00
Daniel Veillard	7331e5cab8	fixed #172260 redundant assignment. fixed xmlSAXParseDoc() and * SAX.c: fixed #172260 redundant assignment. * parser.c include/libxml/parser.h: fixed xmlSAXParseDoc() and xmlParseDoc() signatures #172257. Daniel	2005-03-31 14:59:00 +00:00
William M. Brack	21e4ef20f6	Re-examined the problems of configuring a "minimal" library. Synchronized the header files with the library code in order to assure that all the various conditionals (LIBXML_xxxx_ENABLED) were the same in both. Modified the API database content to more accurately reflect the conditionals. Enhanced the generation of that database. Although there was no substantial change to any of the library code's logic, a large number of files were modified to achieve the above, and the configuration script was enhanced to do some automatic enabling of features (e.g. --with-xinclude forces --with-xpath). Additionally, all the format errors discovered by apibuild.py were corrected. * configure.in: enhanced cross-checking of options * doc/apibuild.py, doc/elfgcchack.xsl, doc/libxml2-refs.xml, doc/libxml2-api.xml, gentest.py: changed the usage of the <cond> element in module descriptions * elfgcchack.h, testapi.c: regenerated with proper conditionals * HTMLparser.c, SAX.c, globals.c, tree.c, xmlschemas.c, xpath.c, testSAX.c: cleaned up conditionals * include/libxml/[SAX.h, SAX2.h, debugXML.h, encoding.h, entities.h, hash.h, parser.h, parserInternals.h, schemasInternals.h, tree.h, valid.h, xlink.h, xmlIO.h, xmlautomata.h, xmlreader.h, xpath.h]: synchronized the conditionals with the corresponding module code * doc/examples/tree2.c, doc/examples/xpath1.c, doc/examples/xpath2.c: added additional conditions required for compilation * doc/.html, doc/html/.html: rebuilt the docs	2005-01-02 09:53:13 +00:00
Daniel Veillard	c14c3892a2	added help for new set shell command added parser option to not generate * debugXML.c: added help for new set shell command * xinclude.c xmllint.c xmlreader.c include/libxml/parser.h: added parser option to not generate XInclude start/end nodes, added a specific option to xmllint to test it fixes #130769 * Makefile.am: regression test the new feature * doc/xmllint.1 doc/xmllint.xml: updated man page to document option. Daniel	2004-08-16 12:34:50 +00:00
Daniel Veillard	29b1748205	small typo pointed out by Mike Hommey slightly improved the --c14n * xmlIO.c: small typo pointed out by Mike Hommey * doc/xmllint.xml, xmllint.html, xmllint.1: slightly improved the --c14n description, c.f. #144675 . * nanohttp.c nanoftp.c: applied a first simple patch from Mike Hommey for $no_proxy, c.f. #133470 * parserInternals.c include/libxml/parserInternals.h include/libxml/xmlerror.h: cleanup to avoid 'error' identifier in includes # * parser.c SAX2.c debugXML.c include/libxml/parser.h: first version of the inplementation of parsing within the context of a node in the tree #142359, new function xmlParseInNodeContext(), added support at the xmllint --shell level as the "set" function * test/scripts/set* result/scripts/* Makefile.am: extended the script based regression tests to instrument the new function. Daniel	2004-08-16 00:39:03 +00:00
Daniel Veillard	0df3bc3f28	fixed a serious problem when substituing entities using the Reader, the * parser.c xmlreader.c include/libxml/parser.h: fixed a serious problem when substituing entities using the Reader, the entities content might be freed and if rereferenced would crash * Makefile.am test/* result/*: added a new test case and a new test operation for the reader with substitution of entities. Daniel	2004-06-08 12:03:41 +00:00
Daniel Veillard	3671190b54	added xmlByteConsumed() interface updated the benchmark rebuilt the docs * parserInternals.c xmlIO.c encoding.c include/libxml/parser.h include/libxml/xmlIO.h: added xmlByteConsumed() interface * doc/: updated the benchmark rebuilt the docs python/tests/Makefile.am python/tests/indexes.py: added a specific regression test for xmlByteConsumed() * include/libxml/encoding.h rngparser.c tree.c: small cleanups Daniel	2004-02-11 13:25:26 +00:00
William M. Brack	a2e844a3b3	moved string and UTF8 routines out of parser.c and encoding.c into a new * encoding.c, parser.c, xmlstring.c, Makefile.am, include/libxml/Makefile.am, include/libxml/catalog.c, include/libxml/chvalid.h, include/libxml/encoding.h, include/libxml/parser.h, include/libxml/relaxng.h, include/libxml/tree.h, include/libxml/xmlwriter.h, include/libxml/xmlstring.h: moved string and UTF8 routines out of parser.c and encoding.c into a new module xmlstring.c with include file include/libxml/xmlstring.h mostly using patches from Reid Spencer. Since xmlChar now defined in xmlstring.h, several include files needed to have a #include added for safety. * doc/apibuild.py: added some additional sorting for various references displayed in the APIxxx.html files. Rebuilt the docs, and also added new file for xmlstring module. * configure.in: small addition to help my testing; no effect on normal usage. * doc/search.php: added $_GET[query] so that persistent globals can be disabled (for recent versions of PHP)	2004-01-06 11:52:13 +00:00

1 2 3 4

160 Commits