libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-06-02 02:01:47 +03:00

Author	SHA1	Message	Date
Denis Pauk	fdf990c2ef	Allow to parse 1 byte HTML files For https://bugzilla.gnome.org/show_bug.cgi?id=605740 File 1 byte long were not accepted by the HTML push parser	2012-05-10 20:40:49 +08:00
Martin Schröder	b91111b475	Patch that fixes the skipping of the HTML_PARSE_NOIMPLIED flag For https://bugzilla.gnome.org/show_bug.cgi?id=642916 I just noticed that the HTML_PARSE_NOIMPLIED flag that you can pass to the HTML-Parser methods doesn't do anything. Its intended purpose is to stop the HTML-parser from forcibly adding a pair of html/body tags if the stream does not contain any. This is highly useful when you don't need this level of strictness. Unfortunately, specifying it doesn't work, because the option is not copied into the parsing context.	2012-05-10 18:52:37 +08:00
Lin Yi-Li	24464be639	Avoid memory leak if xmlParserInputBufferCreateIO fails For https://bugzilla.gnome.org/show_bug.cgi?id=643949 In case of error on an IO creation input the given context is terminated with the given close function, except if the error happened in xmlParserInputBufferCreateIO. This can lead to a resource leak which is fixed by this patch.	2012-05-10 16:14:55 +08:00
Denis Pauk	868d92da89	Add HTML parser support for HTML5 meta charset encoding declaration For https://bugzilla.gnome.org/show_bug.cgi?id=655218 http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#the-meta-element """ The charset attribute specifies the character encoding used by the document. This is a character encoding declaration. If the attribute is present in an XML document, its value must be an ASCII case-insensitive match for the string "UTF-8" (and the document is therefore forced to use UTF-8 as its encoding). """ However, while <meta http-equiv="Content-Type" content="text/html; charset=utf8"> works, <meta charset="utf8"> does not. While libxml2 HTML parser is not tuned for HTML5, this is a simple addition Also added a testcase	2012-05-10 15:34:57 +08:00
Pavel Andrejs	8ad4da5f56	HTML element position is not detected propperly The data in node_seq in xmlParserCtxt was not updated properly when parsing HTML. This patch fixes the accounting for both pull and push mode of HTML parsing.	2012-05-08 11:01:12 +08:00
Daniel Veillard	c62efc847c	Add options to ignore the internal encoding For both XML and HTML, the document can provide an encoding either in XMLDecl in XML, or as a meta element in HTML head. This adds options to ignore those encodings if the encoding is known in advace for example if the content had been converted before being passed to the parser. * parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option for XML parsing * include/libxml/HTMLparser.h HTMLparser.c: adds the HTML_PARSE_IGNORE_ENC for HTML parsing * HTMLtree.c: fix the handling of saving when an unknown encoding is defined in meta document header * xmllint.c: add a --noenc option to activate the new parser options	2011-05-26 11:47:37 +08:00
Denis Pauk	91d239c5cf	617468 fix progressive HTML parsing with style using "'" Style and script can contain ',"". This patch fixes call htmlParseLookupSequence with set flag 'ignoreattrval' to ignore this char	2010-11-04 12:39:18 +01:00
Pierre Belzile	d4b5447141	614005 Possible erroneous HTML parsing on unterminated script Fix a nasty error handling problem when an error happen at the end of the input buffer.	2010-11-04 10:18:17 +01:00
Daniel Veillard	8ad2930f62	make sure htmlCtxtReset do reset the disableSAX field As pointed out by Stefan Behnel <stefan_ml@behnel.de>	2010-10-28 11:51:22 +02:00
Michael Day	af58ee130f	Fix a couple of typo in HTML parser error messages	2010-08-02 13:43:28 +02:00
Daniel Veillard	f1121c48af	Add an HTML parser option to avoid a default doctype - include/libxml/HTMLparser.h: defines the new HTML parser option HTML_PARSE_NODEFDTD - HTMLparser.c: if option is set don't add a default DTD - xmllint.c: add the corresponding --nodefdtd option in xmllint	2010-07-26 14:02:42 +02:00
Daniel Veillard	06c93b7509	Remove a few warnings	2010-03-15 16:08:44 +01:00
Daniel Veillard	3c080d6d72	Don't give default HTML boolean attribute values in parser * HTMLparser.c: don't default value of HTML boolean attributes in the parser * SAX2.c: move this to SAX2 tree building backend * result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax result/HTML/wired.html.sax: this changes a few HTML SAX regression tests	2010-03-15 15:47:50 +01:00
Eugene Pimenov	615904f582	Switch the HTML parser to be non-recursive * HTMLparser.c: new htmlParseElementInternal non recursive, with htmlParseContentInternal and new function to handle node info and element end. * include/libxml/parser.h: add new stack for element info in parser context * parserInternals.c: fee element info stack	2010-03-15 15:16:02 +01:00
Eugene Pimenov	ef9c636ac1	Cleanup a couple of weirdness in HTML parser	2010-03-15 11:37:48 +01:00
Eugene Pimenov	1e60fbcb6f	htmlCheckEncoding doesn't update input-end after shrink * HTMLparser.c: add the missing update to the end pointer	2010-03-10 18:10:49 +01:00
Daniel Veillard	e20fb5a72c	Fix xmlParseInNodeContext for HTML content xmlParseInNodeContext notices that the enclosing document is an HTML document, so invoke the HTML parser for that fragment, and the HTML parser finding a "<p>hello world!</p>" document automatically augment it with defaulted <html> and <body>. This defaulting should be turned off in the HTML parser for this to work, but there is no such HTML parser option. There is an htmlOmittedDefaultValue global variable that you could use, but really we should not rely on global variable for processing options anymore, best is to add an HTML_PARSE_NOIMPLIED. * include/libxml/HTMLparser.h: add the HTML_PARSE_NOIMPLIED parser flag * HTMLparser.c: do add implied element if HTML_PARSE_NOIMPLIED is set * parser.c: add HTML_PARSE_NOIMPLIED to options for xmlParseInNodeContext on HTML documents	2010-01-29 20:47:08 +01:00
Eugene Pimenov	4b41f15dcd	Fix some missing commas in HTML element lists * HTMLparse.c: fix the macros BLOCK and INLINE to use commas and avoid transparent contatenation of strings	2010-01-20 14:25:59 +01:00
Daniel Veillard	13cee4e37b	Fix a bunch of scan 'dead increments' and cleanup * HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c: fix unused variables, or unneeded increments as well as a couple of space issues * runtest.c: check for NULL before calling unlink()	2009-09-05 14:52:55 +02:00
Daniel Veillard	eeb9932990	444994 HTML chunked failure for attribute with <> * HTMLparser.c: fix htmlParseLookupSequence to not save ctxt->checkIndex when the current buffer ends within an attribute value, as this information would be missed in next pass.	2009-08-25 14:42:16 +02:00
Adiel Mittmann	8a103793f2	Non ASCII character may be split at buffer end * HTMLparser.c: make sure when we call xmlParserInputGrow in htmlCurrentChar, to reset the current pointer	2009-08-25 11:27:13 +02:00
Markus Kull	56a03035bf	572129 speed up parasing of large HTML text nodes * HTMLparser.c: use a different lookup function htmlParseLookupChars() to avoid the quadratic behaviour	2009-08-24 19:00:23 +02:00
Daniel Veillard	b468f7444c	Remove a pedantic warning	2009-08-24 18:45:33 +02:00
Daniel Veillard	856c668c1a	Fix HTML parsing with 0 character in CDATA * HTMLparser.c: 0 before the end of the input need some special case handling, raise the error and return a space instead	2009-08-24 18:16:56 +02:00
Daniel Veillard	029a04d265	541335 HTML avoid creating 2 head or 2 body element * HTMLparser.c: check when we see an head or a body tag and avoid autogenerating them * include/libxml/parser.h: the values for ctxt->html change depending on the head or body tags being seen	2009-08-24 12:50:23 +02:00
Daniel Veillard	6339c1a886	541237 error correcting missing end tags in HTML * HTMLparser.c: make sure /p closes the FONTSTYLE list of elements	2009-08-24 11:59:51 +02:00
Daniel Veillard	db4ac221f0	Fix a small problem on previous HTML parser patch	2009-08-22 17:58:31 +02:00
Daniel Veillard	e77db16ab1	592430 - HTML parser runs into endless loop * HTMLparser.c: fix the problem with detection erroring absolutely, and properly popping up the stack when in EOF, also passes XML_PARSE_HUGE when decoding options.	2009-08-22 11:32:38 +02:00
Daniel Veillard	7459c595a0	588441 allow '.' in HTML Names even if invalid * HTMLparser.c: just allow '.' in htmlParseHTMLName list of characters	2009-08-13 10:10:29 +02:00
Daniel Veillard	533ec0e073	579317 Try to find the HTML encoding information * HTMLparser.c: if we hit an encoding error before parsing a potential <meta> with the info look in the input buffer to see if we can find it instead of forcing a blind switch to ISO-8859-1	2009-08-12 23:00:22 +02:00
Jiri Netolicky	446e126de5	576368 – htmlChunkParser with special attributes * HTMLparser.c: htmlChunkParsing failed when the chunk ends inside element after some attribute which has a '>' char in its value.	2009-08-07 17:05:36 +02:00
Daniel Veillard	4d3e2da7f8	* HTMLparser.c: make sure we keep line numbers fixes #580705 based Aaron Patterson patch Daniel	2009-05-15 17:55:45 +02:00
Roland Steiner	04f8eef852	* HTMLparser.c: a broken HTML table attributes initialization, fixes #581803, by Roland Steiner <rolandsteiner@google.com> Daniel	2009-05-12 09:16:16 +02:00
Daniel Veillard	7f4547cdbd	preparing the release of 2.7.2 fix the Solaris portability issue * configure.in doc/* NEWS: preparing the release of 2.7.2 * dict.c: fix the Solaris portability issue * parser.c: additional cleanup on #554660 fix * test/ent13 result/ent13* result/noent/ent13: added the example in the regression test suite. HTMLparser.c: handle leading BOM in htmlParseElement() Daniel svn path=/trunk/; revision=3799	2008-10-03 07:58:23 +00:00
Daniel Veillard	a57ba4ce96	fix an HTML parsing error on large data sections reported by Mike Day add * HTMLparser.c: fix an HTML parsing error on large data sections reported by Mike Day * test/HTML/utf8bug.html result/HTML/utf8bug.html.err result/HTML/utf8bug.html.sax result/HTML/utf8bug.html: add the reproducer to the test suite daniel svn path=/trunk/; revision=3797	2008-09-25 16:06:18 +00:00
Daniel Veillard	4cc67bb77e	patch from Robert Schwebel , allows to compile the example if configured * doc/examples/reader3.c: patch from Robert Schwebel , allows to compile the example if configured without output support fixes #545582 * Makefile.am: add testrecurse to the make check tests * HTMLparser.c: if the parser got a encoding argument it should be used over what the meta specifies, patch fixing #536346 Daniel svn path=/trunk/; revision=3785	2008-08-29 19:58:23 +00:00
Daniel Veillard	ae0765b681	more progresses against the official regression tests small cleanup for * runxmlconf.c: more progresses against the official regression tests * runsuite.c: small cleanup for non-leak reports * include/libxml/tree.h: parsing flags and other properties are now added to the document node, this is generally useful and allow to make Name and NmToken validations based on the parser flags, more specifically the 5th edition of XML or not * HTMLparser.c tree.c: small side effects for the previous changes * parser.c SAX2.c valid.c: the bulk of teh changes are here, the parser and validation behaviour can be affected, parsing flags need to be copied, lot of changes. Also fixing various validation problems in the regression tests. Daniel svn path=/trunk/; revision=3762	2008-07-31 19:54:59 +00:00
Daniel Veillard	ed86dc2383	applied patch from Ashwin fixing a number of realloc problems improve * uri.c: applied patch from Ashwin fixing a number of realloc problems * HTMLparser.c: improve handling for misplaced html/head/body Daniel svn path=/trunk/; revision=3740	2008-04-24 11:58:41 +00:00
Daniel Veillard	36de63e71d	apparently it's okay to forget the semicolumn after entity refs in HTML, * HTMLparser.c: apparently it's okay to forget the semicolumn after entity refs in HTML, fixing char refs parsing accordingly based on T. Manske patch, this should fix #517653 Daniel svn path=/trunk/; revision=3726	2008-04-03 09:05:05 +00:00
Daniel Veillard	35fcbb84d2	patch from Arnold Hendriks improving parsing of html within html bogus * HTMLparser.c: patch from Arnold Hendriks improving parsing of html within html bogus data, still not a complete fix though Daniel svn path=/trunk/; revision=3704	2008-03-12 21:43:39 +00:00
Daniel Veillard	c5b43cc03a	avoid stopping parsing when encountering out of range characters in an * HTMLparser.c: avoid stopping parsing when encountering out of range characters in an HTML file, report and continue processing instead, should fix #472696 Daniel svn path=/trunk/; revision=3675	2008-01-11 07:41:39 +00:00
Daniel Veillard	640f89ef61	fix definition for <embed> to avoid error when saving back, patch from * HTMLparser.c: fix definition for <embed> to avoid error when saving back, patch from Stefan Behnel fixing 495213 Daniel svn path=/trunk/; revision=3671	2008-01-11 06:24:09 +00:00
Daniel Veillard	861101d1fa	fixed bug #381877 , avoid reading over the end of stream when generating an * HTMLparser.c: fixed bug #381877, avoid reading over the end of stream when generating an UTF-8 encoding error. Daniel svn path=/trunk/; revision=3627	2007-06-12 08:38:57 +00:00
Daniel Veillard	491e58e575	applied patch from Michael Day to add support for <embed> Daniel * HTMLparser.c: applied patch from Michael Day to add support for <embed> Daniel svn path=/trunk/; revision=3611	2007-05-02 16:15:18 +00:00
Daniel Veillard	739e9d0981	Dohh ! Daniel svn path=/trunk/; revision=3610	2007-04-27 09:33:58 +00:00
Daniel Veillard	4d1320fa5b	Jean-Daniel Dupas pointed a couple of problems in htmlCreateDocParserCtxt. * HTMLparser.c: Jean-Daniel Dupas pointed a couple of problems in htmlCreateDocParserCtxt. Daniel svn path=/trunk/; revision=3609	2007-04-26 08:55:33 +00:00
Daniel Veillard	42720248e6	change the way script/style are parsed to not try to detect comments, * HTMLparser.c: change the way script/style are parsed to not try to detect comments, reported by Mike Day * result/HTML/doc3.*: affects the result of that test Daniel svn path=/trunk/; revision=3598	2007-04-16 07:02:31 +00:00
William M. Brack	e978ae25ca	fixed memory access error on parsing of meta data which had errors (bug * HTMLparser.c: fixed memory access error on parsing of meta data which had errors (bug #382206). Also cleaned up a few warnings by adding some additional DECL macros. svn path=/trunk/; revision=3593	2007-03-21 06:16:02 +00:00
Daniel Veillard	1032ac4c5c	applied patch from Steven Rainwater to fix UTF8ToHtml behaviour on code * HTMLparser.c: applied patch from Steven Rainwater to fix UTF8ToHtml behaviour on code points which are not mappable to predefined HTML entities, fixes #377544 Daniel	2006-11-23 16:18:30 +00:00
Daniel Veillard	772869fe10	change htmlCtxtReset() following Michael Day bug report and suggestion. * HTMLparser.c: change htmlCtxtReset() following Michael Day bug report and suggestion. Daniel	2006-11-08 09:16:56 +00:00

1 2 3 4 5 ...

362 Commits