libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	080285724b	html: Make data parsing modes work with push parser This can't be solved with a simple scan for a terminator. Instead, we make htmlParseCharData handle incomplete data if the "partial" flag is set.	2025-02-02 11:15:45 +01:00
Nick Wellnhofer	f77ec16db0	html: Optimize htmlParseCharData	2024-10-06 20:04:00 +02:00
Nick Wellnhofer	e179f3ec0e	html: Stop reporting syntax errors It doesn't make much sense to keep the old syntax error handling which doesn't conform to HTML5. Handling HTML5 parser errors is rather involved and not essential for parsers.	2024-10-06 20:04:00 +02:00
Nick Wellnhofer	17da54c522	html: Normalize newlines	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	5951179239	html: Parse named character references according to HTML5	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	a80f8b64a9	html: Allow attributes in end tags Attribute are syntactically allowed in HTML5 end tags but otherwise ignored.	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	dcb2abb2fe	html: Parse tag and attribute names according to HTML5 HTML5 allows bascially all characters in tag and attribute names.	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	93ce33c2b8	Fix several quadratic runtime issues in HTML push parser Fix a few remaining cases where the HTML push parser would scan more content during lookahead than being parsed later. Make sure that htmlParseDocTypeDecl consumes all content up to the final '>' in case of errors. The old comment said "We shouldn't try to resynchronize", but ignoring invalid content is also what the HTML5 spec mandates. Likewise, make htmlParseEndTag skip to the final '>' in invalid end tags even if not in recovery mode. This is probably the most visible change in practice and leads to different output for some tests but is also more in line with HTML5. Make sure that htmlParsePI and htmlParseComment don't abort if invalid characters are encountered but log an error and ignore the character. Change some other end-of-buffer checks to test for a zero byte instead of relying on IS_CHAR. Fix usage of IS_CHAR macro in htmlParseScript.	2020-07-23 20:47:35 +02:00
Nick Wellnhofer	0b2d5c48e3	Initialize keepBlanks in HTML parser This caused failures in the HTML push tests but the fix required to change the expected output of the HTML SAX tests.	2017-06-12 19:11:54 +02:00
Daniel Veillard	3c080d6d72	Don't give default HTML boolean attribute values in parser * HTMLparser.c: don't default value of HTML boolean attributes in the parser * SAX2.c: move this to SAX2 tree building backend * result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax result/HTML/wired.html.sax: this changes a few HTML SAX regression tests	2010-03-15 15:47:50 +01:00
Daniel Veillard	c47d263049	fixing HTML minimized attribute values to be generated internally if not * HTMLparser.c: fixing HTML minimized attribute values to be generated internally if not present, fixes bug #332124 * result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax result/HTML/wired.html.sax: this affects the SAX event strem for a few test cases Daniel	2006-10-17 16:13:27 +00:00
Daniel Veillard	36d73403ff	Applied the last patch from Gary Coady for #304637 changing the behaviour * HTMLparser.c: Applied the last patch from Gary Coady for #304637 changing the behaviour when text nodes are found in body * result/HTML/*: this changes the output of some tests Daniel	2005-09-01 09:52:30 +00:00
Daniel Veillard	0a2a163d2e	- HTMLparser.c: Patch from Jonas Borgstr�m (htmlGetEndPriority): New function, returns the priority of a certain element. (htmlAutoCloseOnClose): Only close inline elements if they all have lower or equal priority. - result/HTML: this of course changed a number of tests results. Daniel	2001-05-11 14:18:03 +00:00
Daniel Veillard	a2bc368bc9	- HTMLparser.c: trying to fix the problem reported by Jonas Borgstr�m - results/HTML/ : a few changes in the output of the HTML tests as a result. - configure.in: tying to fix -liconv where needed Daniel	2001-05-03 08:27:20 +00:00
Daniel Veillard	56098d4f35	- HTMLparser.c : HTML parsing still sucks ... trying to deal with madness - result/HTML/ : this modified the result of the regression tests a lot. Daniel	2001-04-24 12:51:09 +00:00
Daniel Veillard	a3bfca59bf	parsing real HTML is a nightmare. - HTMLparser.c result/HTML/*: revamped the way the HTML parser handles end of tags or end of input Daniel	2001-04-12 15:42:58 +00:00
Daniel Veillard	126f27992d	Bunch of fixes, finishing moving datastructures to the hash stuff: - hash.[ch] debugXML.c: expanded/enhanced the API, added multikey tuples, made hash structure opaque - valid.[ch]: moved elements, attributes, notations decalarations as well as ID and refs to hash tables. - entities.c: hash cleanup - xmlmemory.c: fixed a dump problem in debug mode - include/Makefile.am: problem passing in DESTDIR= values patch from Marc Christensen <marc@calderasystems.com> - nanohttp.c: removed debugging remains - HTMLparser.c: the bogus tag should be ignored (Wayne) - HTMLparser.c parser.c: fixing a number of problems with the macros in the parser.c files (Wayne). - HTMLparser.c: close the previous option when opening a new one (Marc Sanfacon). - result/HTML/: updated the HTML results accordingly Daniel	2000-10-24 17:10:12 +00:00
Daniel Veillard	970112a914	Stupid bug fix on the HTML parser: - HTMLparser.c: Doohhh, attribute name parsing was still case sensitive ! Fixed this ... - result/HTML/* : updated the tests results accordingly Daniel	2000-10-03 09:33:21 +00:00
Daniel Veillard	4948eb4fd4	- HTMLparser.c testHTML.c: applied two new patches from Wayne Davison <wayned@blorf.net> - result/HTML/*.sax: regenerated HTML SAX output - parser.c: more cleanup. Daniel	2000-08-29 09:41:15 +00:00
Daniel Veillard	e010c17d78	Mostly HTML generation and parsing enhancements: - HTMLparser.[ch] testHTML.c: applied the second set of patches from Wayne Davison <wayned@blorf.net>, adding htmlEncodeEntities() - HTMLparser.c: fixed an ignorable white space detection bug occuring when parsing with SAX only - result/HTML/*.sax: updated since the output is now HTML encoded... Daniel.	2000-08-28 10:04:51 +00:00
Daniel Veillard	b8f25c9118	work done on auto-opening of <p> tags and cleanup of SAX output, Daniel.	2000-08-19 19:52:36 +00:00
Daniel Veillard	808a3f1f9f	cleaned up the output of SAX tests, Daniel	2000-08-17 13:50:51 +00:00
Daniel Veillard	1255ab7780	Patch from Dave Yearke <yearke@eng.buffalo.edu>: - testHTML.c: fix core dump on Solaris 2.x systems - HTMLparser.c: fix segfault if ctxt->sax->characters() is NULL - result/HTML/*.sax: previous bug fix lead to new results Daniel	2000-08-14 15:13:33 +00:00
Daniel Veillard	87b9539573	Large sync between my W3C base and Gnome's one: - parser.[ch]: added xmlGetFeaturesList() xmlGetFeature() and xmlAddFeature() - tree.[ch]: added xmlAddChildList() - xmllint.c: MAP_FAILED macro test - parser.h: added xmlParseCtxtExternalEntity() - valid.c: applied bug fixes removed warning - tree.c: added CDATA block to elements content - testSAX.c: cleanup of output - testHTML.c: added SAX testing - encoding.c: better error recovery - SAX.c, parser.c: fixed one of the external entity processing of the OASis testsuite - Makefile.am: added HTML SAX regression tests - configure.in: bumped to 2.2.2 - test/HTML/ result/HTML: added a few of HTML tests, and added the SAX results Daniel	2000-08-12 21:12:04 +00:00

24 Commits