libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00

Author	SHA1	Message	Date
Pranjal Jumde	0bcd05c5cd	Heap-based buffer overread in htmlCurrentChar For https://bugzilla.gnome.org/show_bug.cgi?id=758606 * parserInternals.c: (xmlNextChar): Add an test to catch other issues on ctxt->input corruption proactively. For non-UTF-8 charsets, xmlNextChar() failed to check for the end of the input buffer and would continuing reading. Fix this by pulling out the check for the end of the input buffer into common code, and return if we reach the end of the input buffer prematurely. * result/HTML/758606.html: Added. * result/HTML/758606.html.err: Added. * result/HTML/758606.html.sax: Added. * result/HTML/758606_2.html: Added. * result/HTML/758606_2.html.err: Added. * result/HTML/758606_2.html.sax: Added. * test/HTML/758606.html: Added test case. * test/HTML/758606_2.html: Added test case.	2016-05-23 15:01:07 +08:00
Hugh Davenport	beca86e8c8	Detect change of encoding when parsing HTML names From https://bugzilla.gnome.org/show_bug.cgi?id=758518 Happens when a file has a name getting parsed, but no valid encoding set, so libxml has to guess what the encoding is. This patch detects when the buffer location changes, and if it does, restarts the parsing of the name. This slightly change a couple of regression tests output	2016-05-23 15:01:07 +08:00
Pranjal Jumde	a820dbeac2	Bug 758605: Heap-based buffer overread in xmlDictAddString <https://bugzilla.gnome.org/show_bug.cgi?id=758605 > Reviewed by David Kilzer. * HTMLparser.c: (htmlParseName): Add bounds check. (htmlParseNameComplex): Ditto. * result/HTML/758605.html: Added. * result/HTML/758605.html.err: Added. * result/HTML/758605.html.sax: Added. * runtest.c: (pushParseTest): The input for the new test case was so small (4 bytes) that htmlParseChunk() was never called after htmlCreatePushParserCtxt(), thereby creating a false positive test failure. Fixed by using a do-while loop so we always call htmlParseChunk() at least once. * test/HTML/758605.html: Added.	2016-05-23 15:01:07 +08:00
Daniel Veillard	f933c89813	Keep non-significant blanks node in HTML parser For https://bugzilla.gnome.org/show_bug.cgi?id=681822 Regardless if the option HTML_PARSE_NOBLANKS is set or not, blank nodes are removed from a HTML document, for example: <html> <head> <title>This is a test.</title> </head> <body> <p>This is a test.</p> </body> </html> is read as: <html><head><title>This is a test.</title></head><body> <p>This is a test.</p> </body></html> This changes the default behaviour but the old behaviour is available as expected when using the parser flag HTML_PARSE_NOBLANKS Based on original patch from Igor Ignatyuk <igor_ignatiouk@hotmail.com> * HTMLparser.c: change various places in the parser where ignorable_space SAX callback was called without checking for the parser flag preference * xmllint.c: make sure we use the new flag even for HTML parsing * result/HTML/*: this modifies the output of a number of tests	2012-09-07 19:32:12 +08:00
Denis Pauk	a0cd075d94	HTML parser error with <noscript> in the <head> For https://bugzilla.gnome.org/show_bug.cgi?id=615785 When the <noscript> is found, <head> is closed and a <body> element is created. The real <body id="xxx"> gets skipped over, so I can't see any of the body's attributes. Just don't close <head> when encountering a <noscript> Add a regression test too	2012-05-11 19:31:12 +08:00
Denis Pauk	868d92da89	Add HTML parser support for HTML5 meta charset encoding declaration For https://bugzilla.gnome.org/show_bug.cgi?id=655218 http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#the-meta-element """ The charset attribute specifies the character encoding used by the document. This is a character encoding declaration. If the attribute is present in an XML document, its value must be an ASCII case-insensitive match for the string "UTF-8" (and the document is therefore forced to use UTF-8 as its encoding). """ However, while <meta http-equiv="Content-Type" content="text/html; charset=utf8"> works, <meta charset="utf8"> does not. While libxml2 HTML parser is not tuned for HTML5, this is a simple addition Also added a testcase	2012-05-10 15:34:57 +08:00
Daniel Veillard	3c080d6d72	Don't give default HTML boolean attribute values in parser * HTMLparser.c: don't default value of HTML boolean attributes in the parser * SAX2.c: move this to SAX2 tree building backend * result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax result/HTML/wired.html.sax: this changes a few HTML SAX regression tests	2010-03-15 15:47:50 +01:00
Daniel Veillard	a57ba4ce96	fix an HTML parsing error on large data sections reported by Mike Day add * HTMLparser.c: fix an HTML parsing error on large data sections reported by Mike Day * test/HTML/utf8bug.html result/HTML/utf8bug.html.err result/HTML/utf8bug.html.sax result/HTML/utf8bug.html: add the reproducer to the test suite daniel svn path=/trunk/; revision=3797	2008-09-25 16:06:18 +00:00
Daniel Veillard	42720248e6	change the way script/style are parsed to not try to detect comments, * HTMLparser.c: change the way script/style are parsed to not try to detect comments, reported by Mike Day * result/HTML/doc3.*: affects the result of that test Daniel svn path=/trunk/; revision=3598	2007-04-16 07:02:31 +00:00
Daniel Veillard	c47d263049	fixing HTML minimized attribute values to be generated internally if not * HTMLparser.c: fixing HTML minimized attribute values to be generated internally if not present, fixes bug #332124 * result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax result/HTML/wired.html.sax: this affects the SAX event strem for a few test cases Daniel	2006-10-17 16:13:27 +00:00
Daniel Veillard	48519092e5	fixing HTML entities in attributes parsing bug #362552 added to the * HTMLparser.c: fixing HTML entities in attributes parsing bug #362552 * result/HTML/entities2.html* test/HTML/entities2.html: added to the regression suite Daniel	2006-10-17 15:56:35 +00:00
Daniel Veillard	b990008f05	script HTML parser error fix, corrects bug #319715 added test from Michael * HTMLparser.c: script HTML parser error fix, corrects bug #319715 * result/HTML/53867* test/HTML/53867.html: added test from Michael Day to the regression suite Daniel	2005-10-25 12:36:29 +00:00
Daniel Veillard	36d73403ff	Applied the last patch from Gary Coady for #304637 changing the behaviour * HTMLparser.c: Applied the last patch from Gary Coady for #304637 changing the behaviour when text nodes are found in body * result/HTML/*: this changes the output of some tests Daniel	2005-09-01 09:52:30 +00:00
Daniel Veillard	b8c8016044	fixed bug #310333 with a patch close to the provided patch for HTML UTF-8 * HTMLtree.c: fixed bug #310333 with a patch close to the provided patch for HTML UTF-8 serialization * result/HTML/script2.html: this changed the output of that test Daniel	2005-08-08 13:46:45 +00:00
Daniel Veillard	358fef4b1e	applied UTF-8 script parsing bug #310229 fix from Jiri Netolicky added the * HTMLparser.c: applied UTF-8 script parsing bug #310229 fix from Jiri Netolicky * result/HTML/script2.html* test/HTML/script2.html: added the test case from the regression suite Daniel	2005-07-13 16:37:38 +00:00
Daniel Veillard	597f1c1f34	applied patch from James Bursa fixing an html parsing bug in push mode * HTMLparser.c: applied patch from James Bursa fixing an html parsing bug in push mode * result/HTML/repeat.html* test/HTML/repeat.html: added the test to the regression suite Daniel	2005-07-03 23:00:18 +00:00
Daniel Veillard	fc484dd0a0	added support for HTML PIs #156087 added specific tests Daniel * HTMLparser.c: added support for HTML PIs #156087 * test/HTML/python.html result/HTML/python.html*: added specific tests Daniel	2004-10-22 14:34:23 +00:00
Daniel Veillard	18a65095e0	fix to the fix for #141864 from Paul Elseth apply fix from David Gatwood * xmlIO.c: fix to the fix for #141864 from Paul Elseth * HTMLparser.c result/HTML/doc3.htm: apply fix from David Gatwood for #141195 about text between comments. Daniel	2004-05-11 15:57:42 +00:00
Daniel Veillard	42fd412637	change --html to make sure we use the HTML serialization rule by default * xmllint.c: change --html to make sure we use the HTML serialization rule by default when HTML parser is used, add --xmlout to allow to force the XML serializer on HTML. * HTMLtree.c: ugly tweak to fix the output on <p> element and solve #125093 * result/HTML/*: this changes the output of some tests Daniel	2003-11-04 08:47:48 +00:00
Daniel Veillard	652f9aa966	Fix #124907 by simply backporting the same fix as for the XML parser * HTMLparser.c: Fix #124907 by simply backporting the same fix as for the XML parser * result/HTML/doc3.htm.err: change to ID detecting modified one test result. Daniel	2003-10-28 22:04:45 +00:00
Daniel Veillard	05bcb7ed30	fixed to not send NULL to %s printing cleaning up some of the regression * HTMLparser.c: fixed to not send NULL to %s printing * python/tests/error.py result/HTML/doc3.htm.err result/HTML/test3.html.err result/HTML/wired.html.err result/valid/t8.xml.err result/valid/t8a.xml.err: cleaning up some of the regression tests error Daniel	2003-10-19 14:26:34 +00:00
Daniel Veillard	f403d298c3	more code cleanup, especially around error messages, the HTML parser has * HTMLparser.c Makefile.am legacy.c parser.c parserInternals.c include/libxml/xmlerror.h: more code cleanup, especially around error messages, the HTML parser has now been upgraded to the new handling. * result/HTML/*: a few changes in the resulting error messages Daniel	2003-10-05 13:51:35 +00:00
Daniel Veillard	4b1577f14a	removing the SAXresults tree, keeping result in the same tree, added * Makefile.am results/.sax SAXResult/: removing the SAXresults tree, keeping result in the same tree, added SAXtests to the default "make tests" Daniel	2003-09-03 13:10:37 +00:00
Daniel Veillard	20aa0fb478	fixed a small problem in the patch for #118763 this reverts back to the * tree.c: fixed a small problem in the patch for #118763 * result/HTML/doc3.htm*: this reverts back to the previous result Daniel	2003-08-04 19:43:15 +00:00
Daniel Veillard	39057f40d6	fixing HTML attribute serialization bug #118763 applying a modified * tree.c: fixing HTML attribute serialization bug #118763 applying a modified version of the patch from Bacek * result/HTML/doc3.htm*: this modifies the output from one test Daniel	2003-08-04 01:33:43 +00:00
Daniel Veillard	8265a18a6a	do not generate " for " outside of attributes this changes the output * entities.c: do not generate " for " outside of attributes * result//*: this changes the output of some tests Daniel	2003-06-13 10:05:56 +00:00
William M. Brack	3b811174f7	Updated testfiles for error.c fix	2003-05-14 02:53:43 +00:00
Daniel Veillard	ef0b450163	fixed some problems related to #75813 about handling of Result Value Trees * xpath.c: fixed some problems related to #75813 about handling of Result Value Trees Daniel	2003-03-24 13:57:34 +00:00
Daniel Veillard	77a90a7f8e	patch from johan@evenhuis.nl for #107937 fixing some line counting * HTMLparser.c parser.c parserInternals.c: patch from johan@evenhuis.nl for #107937 fixing some line counting problems, and some other cleanups. * result/HTML/: this result in some line number changes Daniel	2003-03-22 00:04:05 +00:00
Daniel Veillard	fee408f5eb	final touch at closing #87235 </p> end tags need to be generated. this * HTMLparser.c: final touch at closing #87235 </p> end tags need to be generated. * result/HTML/cf_128.html result/HTML/test2.html result/HTML/test3.html: this change slightly the output of a few tests * doc/*: regenerated Daniel	2002-11-22 13:18:30 +00:00
Daniel Veillard	ce02dbc430	Mikhail Sogrine pointed out a bug in HTML parsing, applied his patch added * HTMLparser.c: Mikhail Sogrine pointed out a bug in HTML parsing, applied his patch * result/HTML/attrents.html result/HTML/attrents.html.err result/HTML/attrents.html.sax test/HTML/attrents.html: added the test and result case provided by Mikhail Sogrine Daniel	2002-10-22 19:14:58 +00:00
Daniel Veillard	8c9872ca2e	trying to fix 87235 about discarded white spaces in the HTML parser. this * HTMLparser.c: trying to fix 87235 about discarded white spaces in the HTML parser. * result/HTML/*: this changes the output of a number of HTML regression tests Daniel	2002-07-05 18:17:10 +00:00
Daniel Veillard	6231e84559	fixed & serialization bug introduced in 2.4.20 this changes a few things * HTMLtree.c: fixed & serialization bug introduced in 2.4.20 * result/HTML/*: this changes a few things in the results Daniel	2002-04-18 11:54:04 +00:00
Daniel Veillard	eb475a37df	fixing bug #78662 i.e. add proper escaping of URI when saving HTML files. * HTMLtree.c uri.c: fixing bug #78662 i.e. add proper escaping of URI when saving HTML files. * result/HTML/*: this impacted some tests Daniel	2002-04-14 22:00:22 +00:00
Daniel Veillard	c1f78343b6	fix comment in scripts element parsing. updated the results. Daniel * HTMLparser.c: fix comment in scripts element parsing. * result/HTML/doc3*: updated the results. Daniel	2001-11-10 11:43:05 +00:00
Daniel Veillard	957fdcf2a3	handle the case of < in quoted attributes, Bastian Kleineidam Daniel * HTMLparser.c test/HTML/lt.html result/HTML/lt.html*: handle the case of < in quoted attributes, Bastian Kleineidam Daniel	2001-11-06 22:50:19 +00:00
Daniel Veillard	166982816e	do not output hexadecimal charrefs when serializing HTML since some * encoding.c entities.c: do not output hexadecimal charrefs when serializing HTML since some version of Netscape can't grok it, generate decimal ones. * result/HTML/doc3.htm: output changed due to previous test * parserInternals.c: repair xmlKeepBlanksDefault() broken in 2.4.4 Daniel	2001-09-14 10:29:27 +00:00
Daniel Veillard	02bb170a8b	- HTMLparser.[ch] HTMLtree.c: stored the inline/block property of element and use it to avoid outputting formatting spaces at the wrong place. Implemented the format parameter for HTML save. - result/HTML/doc2.htm result/HTML/doc3.htm result/HTML/fp40.htm result/HTML/script.html result/HTML/test2.html result/HTML/test3.html result/HTML/wired.html: of course this impact the result of a number of HTML tests Daniel	2001-06-13 21:11:59 +00:00
Daniel Veillard	f0c5376a03	- HTMLtree.c: when in a pre element no formatting space should be added. - test/HTML/pre.html result/HTML/pre.html*: added a regression test Daniel	2001-06-07 16:07:07 +00:00
Daniel Veillard	f69bb4b5bf	- HTMLparser.c: Closed bug #54891 - result/HTML/cf_128.html* test/HTML/cf_128.html: added the test to the suite forgot to commit this one yesterday - encoding.h hash.c nanoftp.h parser.h tree.h uri.h xlink.h xpointer.c: applied a documentation patch from LotR and filled in a few missing descriptions Daniel	2001-05-19 13:24:56 +00:00
Daniel Veillard	0a2a163d2e	- HTMLparser.c: Patch from Jonas Borgstr�m (htmlGetEndPriority): New function, returns the priority of a certain element. (htmlAutoCloseOnClose): Only close inline elements if they all have lower or equal priority. - result/HTML: this of course changed a number of tests results. Daniel	2001-05-11 14:18:03 +00:00
Daniel Veillard	a2bc368bc9	- HTMLparser.c: trying to fix the problem reported by Jonas Borgstr�m - results/HTML/ : a few changes in the output of the HTML tests as a result. - configure.in: tying to fix -liconv where needed Daniel	2001-05-03 08:27:20 +00:00
Daniel Veillard	56098d4f35	- HTMLparser.c : HTML parsing still sucks ... trying to deal with madness - result/HTML/ : this modified the result of the regression tests a lot. Daniel	2001-04-24 12:51:09 +00:00
Daniel Veillard	a3bfca59bf	parsing real HTML is a nightmare. - HTMLparser.c result/HTML/*: revamped the way the HTML parser handles end of tags or end of input Daniel	2001-04-12 15:42:58 +00:00
Daniel Veillard	760f4426f7	Couple of fixes, getting ready for 2.3.1: - configure.in: applied patch from Daniel van Balen for OpenBSD and bumped version to 2.3.1 - HTMLtree.c result/HTML/doc3.htm result/HTML/wired.html: the attempt to find autoclosing was simply broken, removed it, updated the examples, this is better Daniel	2001-02-15 14:59:48 +00:00
Daniel Veillard	f41fbbf6a9	testing and bug fixing related to XSLT: - xpath.c result/XPath/tests/chaptersprefol: bugfixes on order and on predicate - HTMLparser.[ch] HTMLtree.c result/HTML/doc3.htm.err result/HTML/doc3.htm.sax result/HTML/wired.html: sometimes one really want to have tags closed on output even if we accept unclosed ones on input Daniel	2001-02-13 17:05:35 +00:00
Daniel Veillard	f62ceffb7e	General fixes, XPointer improvements: - HTMLparser.c: some fixes on auto-open of html/head/body - encoding.c: fixed a compilation error on some gcc env - xpath.c xpointer.[ch] xpathInternals.h: improved the XPointer implementation - test/XPath/xptr/strpoint test/XPath/xptr/strrange3: added related XPointer tests and associated results Daniel	2000-11-24 23:36:01 +00:00
Daniel Veillard	c4f4f0b76f	- xpath.c: fixed the root evaluation problems - HTMLparser.c result/HTML/doc3.htm: fixed the problem of non ignorable spaces with <b> <bold> <em> - tree.c: fixed a loop in xmlSearchNsByHref() Daniel	2000-10-29 17:46:30 +00:00
Daniel Veillard	126f27992d	Bunch of fixes, finishing moving datastructures to the hash stuff: - hash.[ch] debugXML.c: expanded/enhanced the API, added multikey tuples, made hash structure opaque - valid.[ch]: moved elements, attributes, notations decalarations as well as ID and refs to hash tables. - entities.c: hash cleanup - xmlmemory.c: fixed a dump problem in debug mode - include/Makefile.am: problem passing in DESTDIR= values patch from Marc Christensen <marc@calderasystems.com> - nanohttp.c: removed debugging remains - HTMLparser.c: the bogus tag should be ignored (Wayne) - HTMLparser.c parser.c: fixing a number of problems with the macros in the parser.c files (Wayne). - HTMLparser.c: close the previous option when opening a new one (Marc Sanfacon). - result/HTML/: updated the HTML results accordingly Daniel	2000-10-24 17:10:12 +00:00
Daniel Veillard	7eda8452f8	- HTMLparser.c HTMLtree.[ch] SAX.c testHTML.c tree.c: fixed HTML support for SCRIPT and STYLE with help from Bjorn Reese - test/HTML/* result/HTML/*: added simple testcase and updated the existing ones. Daniel	2000-10-14 23:38:43 +00:00

1 2

72 Commits