1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00
Commit Graph

72 Commits

Author SHA1 Message Date
Pranjal Jumde
0bcd05c5cd Heap-based buffer overread in htmlCurrentChar
For https://bugzilla.gnome.org/show_bug.cgi?id=758606

* parserInternals.c:
(xmlNextChar): Add an test to catch other issues on ctxt->input
corruption proactively.
For non-UTF-8 charsets, xmlNextChar() failed to check for the end
of the input buffer and would continuing reading.  Fix this by
pulling out the check for the end of the input buffer into common
code, and return if we reach the end of the input buffer
prematurely.
* result/HTML/758606.html: Added.
* result/HTML/758606.html.err: Added.
* result/HTML/758606.html.sax: Added.
* result/HTML/758606_2.html: Added.
* result/HTML/758606_2.html.err: Added.
* result/HTML/758606_2.html.sax: Added.
* test/HTML/758606.html: Added test case.
* test/HTML/758606_2.html: Added test case.
2016-05-23 15:01:07 +08:00
Hugh Davenport
beca86e8c8 Detect change of encoding when parsing HTML names
From https://bugzilla.gnome.org/show_bug.cgi?id=758518

Happens when a file has a name getting parsed, but no valid encoding
set, so libxml has to guess what the encoding is. This patch detects
when the buffer location changes, and if it does, restarts the parsing
of the name.

This slightly change a couple of regression tests output
2016-05-23 15:01:07 +08:00
Pranjal Jumde
a820dbeac2 Bug 758605: Heap-based buffer overread in xmlDictAddString <https://bugzilla.gnome.org/show_bug.cgi?id=758605>
Reviewed by David Kilzer.

* HTMLparser.c:
(htmlParseName): Add bounds check.
(htmlParseNameComplex): Ditto.
* result/HTML/758605.html: Added.
* result/HTML/758605.html.err: Added.
* result/HTML/758605.html.sax: Added.
* runtest.c:
(pushParseTest): The input for the new test case was so small
(4 bytes) that htmlParseChunk() was never called after
htmlCreatePushParserCtxt(), thereby creating a false positive
test failure.  Fixed by using a do-while loop so we always call
htmlParseChunk() at least once.
* test/HTML/758605.html: Added.
2016-05-23 15:01:07 +08:00
Daniel Veillard
f933c89813 Keep non-significant blanks node in HTML parser
For https://bugzilla.gnome.org/show_bug.cgi?id=681822

Regardless if the option HTML_PARSE_NOBLANKS is set or not, blank nodes
are removed from a HTML document, for example:

<html>
  <head>
    <title>This is a test.</title>
  </head>
  <body>
    <p>This is a test.</p>
  </body>
</html>

is read as:

<html><head><title>This is a test.</title></head><body>
    <p>This is a test.</p>
  </body></html>

This changes the default behaviour but the old behaviour is available
as expected when using the parser flag HTML_PARSE_NOBLANKS

Based on original patch from Igor Ignatyuk <igor_ignatiouk@hotmail.com>

* HTMLparser.c: change various places in the parser where ignorable_space
  SAX callback was called without checking for the parser flag preference
* xmllint.c: make sure we use the new flag even for HTML parsing
* result/HTML/*: this modifies the output of a number of tests
2012-09-07 19:32:12 +08:00
Denis Pauk
a0cd075d94 HTML parser error with <noscript> in the <head>
For https://bugzilla.gnome.org/show_bug.cgi?id=615785
When the <noscript> is found, <head> is closed and a <body> element is created.
The real <body id="xxx"> gets skipped over, so I can't see any of the
body's attributes.
Just don't close <head> when encountering a <noscript>
Add a regression test too
2012-05-11 19:31:12 +08:00
Denis Pauk
868d92da89 Add HTML parser support for HTML5 meta charset encoding declaration
For https://bugzilla.gnome.org/show_bug.cgi?id=655218

http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#the-meta-element

"""
The charset attribute specifies the character encoding used by the document.
This is a character encoding declaration. If the attribute is present in an XML
document, its value must be an ASCII case-insensitive match for the string
"UTF-8" (and the document is therefore forced to use UTF-8 as its
encoding).
"""

However, while <meta http-equiv="Content-Type" content="text/html;
charset=utf8"> works, <meta charset="utf8"> does not.

While libxml2 HTML parser is not tuned for HTML5, this is a simple
addition

Also added a testcase
2012-05-10 15:34:57 +08:00
Daniel Veillard
3c080d6d72 Don't give default HTML boolean attribute values in parser
* HTMLparser.c: don't default value of HTML boolean attributes in the
  parser
* SAX2.c: move this to SAX2 tree building backend
* result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax
  result/HTML/wired.html.sax: this changes a few HTML SAX regression
  tests
2010-03-15 15:47:50 +01:00
Daniel Veillard
a57ba4ce96 fix an HTML parsing error on large data sections reported by Mike Day add
* HTMLparser.c: fix an HTML parsing error on large data sections
  reported by Mike Day
* test/HTML/utf8bug.html result/HTML/utf8bug.html.err
  result/HTML/utf8bug.html.sax result/HTML/utf8bug.html: add the
  reproducer to the test suite
daniel

svn path=/trunk/; revision=3797
2008-09-25 16:06:18 +00:00
Daniel Veillard
42720248e6 change the way script/style are parsed to not try to detect comments,
* HTMLparser.c: change the way script/style are parsed to
  not try to detect comments, reported by Mike Day
* result/HTML/doc3.*: affects the result of that test
Daniel

svn path=/trunk/; revision=3598
2007-04-16 07:02:31 +00:00
Daniel Veillard
c47d263049 fixing HTML minimized attribute values to be generated internally if not
* HTMLparser.c: fixing HTML minimized attribute values to be generated
  internally if not present, fixes bug #332124
* result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax
  result/HTML/wired.html.sax: this affects the SAX event strem for
  a few test cases
Daniel
2006-10-17 16:13:27 +00:00
Daniel Veillard
48519092e5 fixing HTML entities in attributes parsing bug #362552 added to the
* HTMLparser.c: fixing HTML entities in attributes parsing bug #362552
* result/HTML/entities2.html* test/HTML/entities2.html: added to
  the regression suite
Daniel
2006-10-17 15:56:35 +00:00
Daniel Veillard
b990008f05 script HTML parser error fix, corrects bug #319715 added test from Michael
* HTMLparser.c: script HTML parser error fix, corrects bug #319715
* result/HTML/53867* test/HTML/53867.html: added test from Michael Day
  to the regression suite
Daniel
2005-10-25 12:36:29 +00:00
Daniel Veillard
36d73403ff Applied the last patch from Gary Coady for #304637 changing the behaviour
* HTMLparser.c: Applied the last patch from Gary Coady for #304637
  changing the behaviour when text nodes are found in body
* result/HTML/*: this changes the output of some tests
Daniel
2005-09-01 09:52:30 +00:00
Daniel Veillard
b8c8016044 fixed bug #310333 with a patch close to the provided patch for HTML UTF-8
* HTMLtree.c: fixed bug #310333 with a patch close to the provided
  patch for HTML UTF-8 serialization
* result/HTML/script2.html: this changed the output of that test
Daniel
2005-08-08 13:46:45 +00:00
Daniel Veillard
358fef4b1e applied UTF-8 script parsing bug #310229 fix from Jiri Netolicky added the
* HTMLparser.c: applied UTF-8 script parsing bug #310229 fix from
  Jiri Netolicky
* result/HTML/script2.html* test/HTML/script2.html: added the test
  case from the regression suite
Daniel
2005-07-13 16:37:38 +00:00
Daniel Veillard
597f1c1f34 applied patch from James Bursa fixing an html parsing bug in push mode
* HTMLparser.c: applied patch from James Bursa fixing an html parsing
  bug in push mode
* result/HTML/repeat.html* test/HTML/repeat.html: added the test to the
  regression suite
Daniel
2005-07-03 23:00:18 +00:00
Daniel Veillard
fc484dd0a0 added support for HTML PIs #156087 added specific tests Daniel
* HTMLparser.c: added support for HTML PIs #156087
* test/HTML/python.html result/HTML/python.html*: added specific tests
Daniel
2004-10-22 14:34:23 +00:00
Daniel Veillard
18a65095e0 fix to the fix for #141864 from Paul Elseth apply fix from David Gatwood
* xmlIO.c: fix to the fix for #141864 from Paul Elseth
* HTMLparser.c result/HTML/doc3.htm: apply fix from David Gatwood for
  #141195 about text between comments.
Daniel
2004-05-11 15:57:42 +00:00
Daniel Veillard
42fd412637 change --html to make sure we use the HTML serialization rule by default
* xmllint.c: change --html to make sure we use the HTML serialization
  rule by default when HTML parser is used, add --xmlout to allow to
  force the XML serializer on HTML.
* HTMLtree.c: ugly tweak to fix the output on <p> element and
  solve #125093
* result/HTML/*: this changes the output of some tests
Daniel
2003-11-04 08:47:48 +00:00
Daniel Veillard
652f9aa966 Fix #124907 by simply backporting the same fix as for the XML parser
* HTMLparser.c: Fix #124907 by simply backporting the same
  fix as for the XML parser
* result/HTML/doc3.htm.err: change to ID detecting modified one
  test result.
Daniel
2003-10-28 22:04:45 +00:00
Daniel Veillard
05bcb7ed30 fixed to not send NULL to %s printing cleaning up some of the regression
* HTMLparser.c: fixed to not send NULL to %s printing
* python/tests/error.py result/HTML/doc3.htm.err
  result/HTML/test3.html.err result/HTML/wired.html.err
  result/valid/t8.xml.err result/valid/t8a.xml.err: cleaning
  up some of the regression tests error
Daniel
2003-10-19 14:26:34 +00:00
Daniel Veillard
f403d298c3 more code cleanup, especially around error messages, the HTML parser has
* HTMLparser.c Makefile.am legacy.c parser.c parserInternals.c
  include/libxml/xmlerror.h: more code cleanup, especially around
  error messages, the HTML parser has now been upgraded to the new
  handling.
* result/HTML/*: a few changes in the resulting error messages
Daniel
2003-10-05 13:51:35 +00:00
Daniel Veillard
4b1577f14a removing the SAXresults tree, keeping result in the same tree, added
* Makefile.am results/*.sax SAXResult/*: removing the SAXresults
  tree, keeping result in the same tree, added SAXtests to the
  default "make tests"
Daniel
2003-09-03 13:10:37 +00:00
Daniel Veillard
20aa0fb478 fixed a small problem in the patch for #118763 this reverts back to the
* tree.c: fixed a small problem in the patch for #118763
* result/HTML/doc3.htm*: this reverts back to the previous result
Daniel
2003-08-04 19:43:15 +00:00
Daniel Veillard
39057f40d6 fixing HTML attribute serialization bug #118763 applying a modified
* tree.c: fixing HTML attribute serialization bug #118763
  applying a modified version of the patch from Bacek
* result/HTML/doc3.htm*: this modifies the output from one test
Daniel
2003-08-04 01:33:43 +00:00
Daniel Veillard
8265a18a6a do not generate &quot; for " outside of attributes this changes the output
* entities.c: do not generate &quot; for " outside of attributes
* result//*: this changes the output of some tests
Daniel
2003-06-13 10:05:56 +00:00
William M. Brack
3b811174f7 Updated testfiles for error.c fix 2003-05-14 02:53:43 +00:00
Daniel Veillard
ef0b450163 fixed some problems related to #75813 about handling of Result Value Trees
* xpath.c: fixed some problems related to #75813 about handling
  of Result Value Trees
Daniel
2003-03-24 13:57:34 +00:00
Daniel Veillard
77a90a7f8e patch from johan@evenhuis.nl for #107937 fixing some line counting
* HTMLparser.c parser.c parserInternals.c: patch from
  johan@evenhuis.nl for #107937 fixing some line counting
  problems, and some other cleanups.
* result/HTML/: this result in some line number changes
Daniel
2003-03-22 00:04:05 +00:00
Daniel Veillard
fee408f5eb final touch at closing #87235 </p> end tags need to be generated. this
* HTMLparser.c: final touch at closing #87235 </p> end tags
  need to be generated.
* result/HTML/cf_128.html result/HTML/test2.html result/HTML/test3.html:
  this change slightly the output of a few tests
* doc/*: regenerated
Daniel
2002-11-22 13:18:30 +00:00
Daniel Veillard
ce02dbc430 Mikhail Sogrine pointed out a bug in HTML parsing, applied his patch added
* HTMLparser.c: Mikhail Sogrine pointed out a bug in HTML
  parsing, applied his patch
* result/HTML/attrents.html result/HTML/attrents.html.err
  result/HTML/attrents.html.sax test/HTML/attrents.html:
  added the test and result case provided by Mikhail Sogrine
Daniel
2002-10-22 19:14:58 +00:00
Daniel Veillard
8c9872ca2e trying to fix 87235 about discarded white spaces in the HTML parser. this
* HTMLparser.c: trying to fix 87235 about discarded white
  spaces in the HTML parser.
* result/HTML/*: this changes the output of a number of HTML
  regression tests
Daniel
2002-07-05 18:17:10 +00:00
Daniel Veillard
6231e84559 fixed & serialization bug introduced in 2.4.20 this changes a few things
* HTMLtree.c: fixed & serialization bug introduced in 2.4.20
* result/HTML/*: this changes a few things in the results
Daniel
2002-04-18 11:54:04 +00:00
Daniel Veillard
eb475a37df fixing bug #78662 i.e. add proper escaping of URI when saving HTML files.
* HTMLtree.c uri.c: fixing bug #78662 i.e. add proper
  escaping of URI when saving HTML files.
* result/HTML/*: this impacted some tests
Daniel
2002-04-14 22:00:22 +00:00
Daniel Veillard
c1f78343b6 fix comment in scripts element parsing. updated the results. Daniel
* HTMLparser.c: fix comment in scripts element parsing.
* result/HTML/doc3*: updated the results.
Daniel
2001-11-10 11:43:05 +00:00
Daniel Veillard
957fdcf2a3 handle the case of < in quoted attributes, Bastian Kleineidam Daniel
* HTMLparser.c test/HTML/lt.html result/HTML/lt.html*:
  handle the case of < in quoted attributes, Bastian Kleineidam
Daniel
2001-11-06 22:50:19 +00:00
Daniel Veillard
166982816e do not output hexadecimal charrefs when serializing HTML since some
* encoding.c entities.c: do not output hexadecimal charrefs
  when serializing HTML since some version of Netscape can't
  grok it, generate decimal ones.
* result/HTML/doc3.htm: output changed due to previous test
* parserInternals.c: repair xmlKeepBlanksDefault() broken in 2.4.4
Daniel
2001-09-14 10:29:27 +00:00
Daniel Veillard
02bb170a8b - HTMLparser.[ch] HTMLtree.c: stored the inline/block property
of element and use it to avoid outputting formatting spaces at
  the wrong place. Implemented the format parameter for HTML save.
- result/HTML/doc2.htm result/HTML/doc3.htm result/HTML/fp40.htm
  result/HTML/script.html result/HTML/test2.html result/HTML/test3.html
  result/HTML/wired.html: of course this impact the result of a
  number of HTML tests
Daniel
2001-06-13 21:11:59 +00:00
Daniel Veillard
f0c5376a03 - HTMLtree.c: when in a pre element no formatting space should
be added.
- test/HTML/pre.html result/HTML/pre.html*: added a regression test
Daniel
2001-06-07 16:07:07 +00:00
Daniel Veillard
f69bb4b5bf - HTMLparser.c: Closed bug #54891
- result/HTML/cf_128.html* test/HTML/cf_128.html: added the test
  to the suite
forgot to commit this one yesterday
- encoding.h hash.c nanoftp.h parser.h tree.h uri.h xlink.h xpointer.c:
  applied a documentation patch from LotR and filled in a few missing
  descriptions
Daniel
2001-05-19 13:24:56 +00:00
Daniel Veillard
0a2a163d2e - HTMLparser.c: Patch from Jonas Borgstrm
(htmlGetEndPriority): New function, returns
the priority of a certain element.
(htmlAutoCloseOnClose): Only close inline elements if they
all have lower or equal priority.
- result/HTML: this of course changed a number of tests results.
Daniel
2001-05-11 14:18:03 +00:00
Daniel Veillard
a2bc368bc9 - HTMLparser.c: trying to fix the problem reported by Jonas Borgstrm
- results/HTML/ : a few changes in the output of the HTML tests as
  a result.
- configure.in: tying to fix -liconv where needed
Daniel
2001-05-03 08:27:20 +00:00
Daniel Veillard
56098d4f35 - HTMLparser.c : HTML parsing still sucks ... trying to deal
with madness
- result/HTML/ : this modified the result of the regression tests
  a lot.
Daniel
2001-04-24 12:51:09 +00:00
Daniel Veillard
a3bfca59bf parsing real HTML is a nightmare.
- HTMLparser.c result/HTML/*: revamped the way the HTML
  parser handles end of tags or end of input
Daniel
2001-04-12 15:42:58 +00:00
Daniel Veillard
760f4426f7 Couple of fixes, getting ready for 2.3.1:
- configure.in: applied patch from Daniel van Balen for OpenBSD
  and bumped version to 2.3.1
- HTMLtree.c result/HTML/doc3.htm result/HTML/wired.html: the
  attempt to find autoclosing was simply broken, removed it,
  updated the examples, this is better
Daniel
2001-02-15 14:59:48 +00:00
Daniel Veillard
f41fbbf6a9 testing and bug fixing related to XSLT:
- xpath.c result/XPath/tests/chaptersprefol: bugfixes on order and
  on predicate
- HTMLparser.[ch] HTMLtree.c result/HTML/doc3.htm.err
  result/HTML/doc3.htm.sax result/HTML/wired.html: sometimes one
  really want to have tags closed on output even if we accept
  unclosed ones on input
Daniel
2001-02-13 17:05:35 +00:00
Daniel Veillard
f62ceffb7e General fixes, XPointer improvements:
- HTMLparser.c: some fixes on auto-open of html/head/body
- encoding.c: fixed a compilation error on some gcc env
- xpath.c xpointer.[ch] xpathInternals.h: improved the
  XPointer implementation
- test/XPath/xptr/strpoint test/XPath/xptr/strrange3: added
  related XPointer tests and associated results
Daniel
2000-11-24 23:36:01 +00:00
Daniel Veillard
c4f4f0b76f - xpath.c: fixed the root evaluation problems
- HTMLparser.c result/HTML/doc3.htm: fixed the problem of non
  ignorable spaces with <b> <bold> <em>
- tree.c: fixed a loop in xmlSearchNsByHref()
Daniel
2000-10-29 17:46:30 +00:00
Daniel Veillard
126f27992d Bunch of fixes, finishing moving datastructures to the hash stuff:
- hash.[ch] debugXML.c: expanded/enhanced the API, added
  multikey tuples, made hash structure opaque
- valid.[ch]: moved elements, attributes, notations decalarations
  as well as ID and refs to hash tables.
- entities.c: hash cleanup
- xmlmemory.c: fixed a dump problem in debug mode
- include/Makefile.am: problem passing in DESTDIR= values patch
  from Marc Christensen <marc@calderasystems.com>
- nanohttp.c: removed debugging remains
- HTMLparser.c: the bogus tag should be ignored (Wayne)
- HTMLparser.c parser.c: fixing a number of problems with the
  macros in the *parser.c files (Wayne).
- HTMLparser.c: close the previous option when opening a new one
  (Marc Sanfacon).
- result/HTML/*: updated the HTML results accordingly
Daniel
2000-10-24 17:10:12 +00:00
Daniel Veillard
7eda8452f8 - HTMLparser.c HTMLtree.[ch] SAX.c testHTML.c tree.c: fixed HTML
support for SCRIPT and STYLE with help from Bjorn Reese
- test/HTML/* result/HTML/*: added simple testcase and updated
  the existing ones.
Daniel
2000-10-14 23:38:43 +00:00