1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00

24 Commits

Author SHA1 Message Date
Nick Wellnhofer
080285724b html: Make data parsing modes work with push parser
This can't be solved with a simple scan for a terminator. Instead, we
make htmlParseCharData handle incomplete data if the "partial" flag is
set.
2025-02-02 11:15:45 +01:00
Nick Wellnhofer
f77ec16db0 html: Optimize htmlParseCharData 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
e179f3ec0e html: Stop reporting syntax errors
It doesn't make much sense to keep the old syntax error handling which
doesn't conform to HTML5.

Handling HTML5 parser errors is rather involved and not essential for
parsers.
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
17da54c522 html: Normalize newlines 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
5951179239 html: Parse named character references according to HTML5 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
a80f8b64a9 html: Allow attributes in end tags
Attribute are syntactically allowed in HTML5 end tags but otherwise
ignored.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
dcb2abb2fe html: Parse tag and attribute names according to HTML5
HTML5 allows bascially all characters in tag and attribute names.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
93ce33c2b8 Fix several quadratic runtime issues in HTML push parser
Fix a few remaining cases where the HTML push parser would scan more
content during lookahead than being parsed later.

Make sure that htmlParseDocTypeDecl consumes all content up to the
final '>' in case of errors. The old comment said "We shouldn't try to
resynchronize", but ignoring invalid content is also what the HTML5
spec mandates.

Likewise, make htmlParseEndTag skip to the final '>' in invalid end
tags even if not in recovery mode. This is probably the most visible
change in practice and leads to different output for some tests but is
also more in line with HTML5.

Make sure that htmlParsePI and htmlParseComment don't abort if invalid
characters are encountered but log an error and ignore the character.

Change some other end-of-buffer checks to test for a zero byte instead
of relying on IS_CHAR.

Fix usage of IS_CHAR macro in htmlParseScript.
2020-07-23 20:47:35 +02:00
Nick Wellnhofer
0b2d5c48e3 Initialize keepBlanks in HTML parser
This caused failures in the HTML push tests but the fix required to
change the expected output of the HTML SAX tests.
2017-06-12 19:11:54 +02:00
Daniel Veillard
3c080d6d72 Don't give default HTML boolean attribute values in parser
* HTMLparser.c: don't default value of HTML boolean attributes in the
  parser
* SAX2.c: move this to SAX2 tree building backend
* result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax
  result/HTML/wired.html.sax: this changes a few HTML SAX regression
  tests
2010-03-15 15:47:50 +01:00
Daniel Veillard
c47d263049 fixing HTML minimized attribute values to be generated internally if not
* HTMLparser.c: fixing HTML minimized attribute values to be generated
  internally if not present, fixes bug #332124
* result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax
  result/HTML/wired.html.sax: this affects the SAX event strem for
  a few test cases
Daniel
2006-10-17 16:13:27 +00:00
Daniel Veillard
36d73403ff Applied the last patch from Gary Coady for #304637 changing the behaviour
* HTMLparser.c: Applied the last patch from Gary Coady for #304637
  changing the behaviour when text nodes are found in body
* result/HTML/*: this changes the output of some tests
Daniel
2005-09-01 09:52:30 +00:00
Daniel Veillard
0a2a163d2e - HTMLparser.c: Patch from Jonas Borgstrm
(htmlGetEndPriority): New function, returns
the priority of a certain element.
(htmlAutoCloseOnClose): Only close inline elements if they
all have lower or equal priority.
- result/HTML: this of course changed a number of tests results.
Daniel
2001-05-11 14:18:03 +00:00
Daniel Veillard
a2bc368bc9 - HTMLparser.c: trying to fix the problem reported by Jonas Borgstrm
- results/HTML/ : a few changes in the output of the HTML tests as
  a result.
- configure.in: tying to fix -liconv where needed
Daniel
2001-05-03 08:27:20 +00:00
Daniel Veillard
56098d4f35 - HTMLparser.c : HTML parsing still sucks ... trying to deal
with madness
- result/HTML/ : this modified the result of the regression tests
  a lot.
Daniel
2001-04-24 12:51:09 +00:00
Daniel Veillard
a3bfca59bf parsing real HTML is a nightmare.
- HTMLparser.c result/HTML/*: revamped the way the HTML
  parser handles end of tags or end of input
Daniel
2001-04-12 15:42:58 +00:00
Daniel Veillard
126f27992d Bunch of fixes, finishing moving datastructures to the hash stuff:
- hash.[ch] debugXML.c: expanded/enhanced the API, added
  multikey tuples, made hash structure opaque
- valid.[ch]: moved elements, attributes, notations decalarations
  as well as ID and refs to hash tables.
- entities.c: hash cleanup
- xmlmemory.c: fixed a dump problem in debug mode
- include/Makefile.am: problem passing in DESTDIR= values patch
  from Marc Christensen <marc@calderasystems.com>
- nanohttp.c: removed debugging remains
- HTMLparser.c: the bogus tag should be ignored (Wayne)
- HTMLparser.c parser.c: fixing a number of problems with the
  macros in the *parser.c files (Wayne).
- HTMLparser.c: close the previous option when opening a new one
  (Marc Sanfacon).
- result/HTML/*: updated the HTML results accordingly
Daniel
2000-10-24 17:10:12 +00:00
Daniel Veillard
970112a914 Stupid bug fix on the HTML parser:
- HTMLparser.c: Doohhh, attribute name parsing was still case
  sensitive ! Fixed this ...
- result/HTML/* : updated the tests results accordingly
Daniel
2000-10-03 09:33:21 +00:00
Daniel Veillard
4948eb4fd4 - HTMLparser.c testHTML.c: applied two new patches from
Wayne Davison <wayned@blorf.net>
- result/HTML/*.sax: regenerated HTML SAX output
- parser.c: more cleanup.
Daniel
2000-08-29 09:41:15 +00:00
Daniel Veillard
e010c17d78 Mostly HTML generation and parsing enhancements:
- HTMLparser.[ch] testHTML.c: applied the second set of
  patches from Wayne Davison <wayned@blorf.net>, adding
  htmlEncodeEntities()
- HTMLparser.c: fixed an ignorable white space detection bug
  occuring when parsing with SAX only
- result/HTML/*.sax: updated since the output is now HTML
  encoded...
Daniel.
2000-08-28 10:04:51 +00:00
Daniel Veillard
b8f25c9118 work done on auto-opening of <p> tags and cleanup of SAX output, Daniel. 2000-08-19 19:52:36 +00:00
Daniel Veillard
808a3f1f9f cleaned up the output of SAX tests, Daniel 2000-08-17 13:50:51 +00:00
Daniel Veillard
1255ab7780 Patch from Dave Yearke <yearke@eng.buffalo.edu>:
- testHTML.c: fix core dump on Solaris 2.x systems
- HTMLparser.c: fix segfault if ctxt->sax->characters() is NULL
- result/HTML/*.sax: previous bug fix lead to new results
Daniel
2000-08-14 15:13:33 +00:00
Daniel Veillard
87b9539573 Large sync between my W3C base and Gnome's one:
- parser.[ch]: added xmlGetFeaturesList() xmlGetFeature() and xmlAddFeature()
- tree.[ch]: added xmlAddChildList()
- xmllint.c: MAP_FAILED macro test
- parser.h: added xmlParseCtxtExternalEntity()
- valid.c: applied bug fixes removed warning
- tree.c: added CDATA block to elements content
- testSAX.c: cleanup of output
- testHTML.c: added SAX testing
- encoding.c: better error recovery
- SAX.c, parser.c: fixed one of the external entity processing of the OASis testsuite
- Makefile.am: added HTML SAX regression tests
- configure.in: bumped to 2.2.2
- test/HTML/ result/HTML: added a few of HTML tests, and added the SAX results

Daniel
2000-08-12 21:12:04 +00:00