1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-26 00:37:43 +03:00
Commit Graph

557 Commits

Author SHA1 Message Date
Daniel Veillard
c62efc847c Add options to ignore the internal encoding
For both XML and HTML, the document can provide an encoding
either in XMLDecl in XML, or as a meta element in HTML head.
This adds options to ignore those encodings if the encoding
is known in advace for example if the content had been converted
before being passed to the parser.

* parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option
  for XML parsing
* include/libxml/HTMLparser.h HTMLparser.c: adds the
  HTML_PARSE_IGNORE_ENC for HTML parsing
* HTMLtree.c: fix the handling of saving when an unknown encoding is
  defined in meta document header
* xmllint.c: add a --noenc option to activate the new parser options
2011-05-26 11:47:37 +08:00
Denis Pauk
91d239c5cf 617468 fix progressive HTML parsing with style using "'"
Style and script can contain ',"". This patch fixes call
htmlParseLookupSequence with set flag 'ignoreattrval' to
ignore this char
2010-11-04 12:39:18 +01:00
Pierre Belzile
d4b5447141 614005 Possible erroneous HTML parsing on unterminated script
Fix a nasty error handling problem when an error happen at the
end of the input buffer.
2010-11-04 10:18:17 +01:00
Daniel Veillard
8ad2930f62 make sure htmlCtxtReset do reset the disableSAX field
As pointed out by Stefan Behnel <stefan_ml@behnel.de>
2010-10-28 11:51:22 +02:00
Michael Day
af58ee130f Fix a couple of typo in HTML parser error messages 2010-08-02 13:43:28 +02:00
Daniel Veillard
f1121c48af Add an HTML parser option to avoid a default doctype
- include/libxml/HTMLparser.h: defines the new HTML parser option
  HTML_PARSE_NODEFDTD
- HTMLparser.c: if option is set don't add a default DTD
- xmllint.c: add the corresponding --nodefdtd option in xmllint
2010-07-26 14:02:42 +02:00
Daniel Veillard
06c93b7509 Remove a few warnings 2010-03-15 16:08:44 +01:00
Daniel Veillard
3c080d6d72 Don't give default HTML boolean attribute values in parser
* HTMLparser.c: don't default value of HTML boolean attributes in the
  parser
* SAX2.c: move this to SAX2 tree building backend
* result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax
  result/HTML/wired.html.sax: this changes a few HTML SAX regression
  tests
2010-03-15 15:47:50 +01:00
Eugene Pimenov
615904f582 Switch the HTML parser to be non-recursive
* HTMLparser.c: new htmlParseElementInternal non recursive, with
  htmlParseContentInternal and new function to handle node info
  and element end.
* include/libxml/parser.h: add new stack for element info in parser
  context
* parserInternals.c: fee element info stack
2010-03-15 15:16:02 +01:00
Eugene Pimenov
ef9c636ac1 Cleanup a couple of weirdness in HTML parser 2010-03-15 11:37:48 +01:00
Eugene Pimenov
1e60fbcb6f htmlCheckEncoding doesn't update input-end after shrink
* HTMLparser.c: add the missing update to the end pointer
2010-03-10 18:10:49 +01:00
Daniel Veillard
e20fb5a72c Fix xmlParseInNodeContext for HTML content
xmlParseInNodeContext notices that the enclosing document is
an HTML document, so invoke the HTML parser for that fragment, and
the HTML parser finding a "<p>hello world!</p>" document automatically
augment it with defaulted <html> and <body>. This defaulting should
be turned off in the HTML parser for this to work, but there is no
such HTML parser option. There is an htmlOmittedDefaultValue global
variable that you could use, but really we should not rely on global
variable for processing options anymore, best is to add an
HTML_PARSE_NOIMPLIED.
* include/libxml/HTMLparser.h: add the HTML_PARSE_NOIMPLIED parser flag
* HTMLparser.c: do add implied element if HTML_PARSE_NOIMPLIED is set
* parser.c: add HTML_PARSE_NOIMPLIED to options for xmlParseInNodeContext
  on HTML documents
2010-01-29 20:47:08 +01:00
Eugene Pimenov
4b41f15dcd Fix some missing commas in HTML element lists
* HTMLparse.c: fix the macros BLOCK and INLINE to use commas and
  avoid transparent contatenation of strings
2010-01-20 14:25:59 +01:00
Daniel Veillard
13cee4e37b Fix a bunch of scan 'dead increments' and cleanup
* HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c
  testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c:
  fix unused variables, or unneeded increments as well as a couple
  of space issues
* runtest.c: check for NULL before calling unlink()
2009-09-05 14:52:55 +02:00
Daniel Veillard
eeb9932990 444994 HTML chunked failure for attribute with <>
* HTMLparser.c: fix htmlParseLookupSequence to not save ctxt->checkIndex
  when the current buffer ends within an attribute value, as this
  information would be missed in next pass.
2009-08-25 14:42:16 +02:00
Adiel Mittmann
8a103793f2 Non ASCII character may be split at buffer end
* HTMLparser.c: make sure when we call xmlParserInputGrow in
  htmlCurrentChar, to reset the current pointer
2009-08-25 11:27:13 +02:00
Markus Kull
56a03035bf 572129 speed up parasing of large HTML text nodes
* HTMLparser.c: use a different lookup function htmlParseLookupChars()
  to avoid the quadratic behaviour
2009-08-24 19:00:23 +02:00
Daniel Veillard
b468f7444c Remove a pedantic warning 2009-08-24 18:45:33 +02:00
Daniel Veillard
856c668c1a Fix HTML parsing with 0 character in CDATA
* HTMLparser.c: 0 before the end of the input need some special case
  handling, raise the error and return a space instead
2009-08-24 18:16:56 +02:00
Daniel Veillard
029a04d265 541335 HTML avoid creating 2 head or 2 body element
* HTMLparser.c: check when we see an head or a body tag and avoid
  autogenerating them
* include/libxml/parser.h: the values for ctxt->html change depending
  on the head or body tags being seen
2009-08-24 12:50:23 +02:00
Daniel Veillard
6339c1a886 541237 error correcting missing end tags in HTML
* HTMLparser.c: make sure /p closes the FONTSTYLE list of elements
2009-08-24 11:59:51 +02:00
Daniel Veillard
db4ac221f0 Fix a small problem on previous HTML parser patch 2009-08-22 17:58:31 +02:00
Daniel Veillard
e77db16ab1 592430 - HTML parser runs into endless loop
* HTMLparser.c: fix the problem with detection erroring absolutely, and
  properly popping up the stack when in EOF, also passes XML_PARSE_HUGE
  when decoding options.
2009-08-22 11:32:38 +02:00
Daniel Veillard
7459c595a0 588441 allow '.' in HTML Names even if invalid
* HTMLparser.c: just allow '.' in htmlParseHTMLName list of characters
2009-08-13 10:10:29 +02:00
Daniel Veillard
533ec0e073 579317 Try to find the HTML encoding information
* HTMLparser.c: if we hit an encoding error before parsing a potential
  <meta> with the info look in the input buffer to see if we can find
  it instead of forcing a blind switch to ISO-8859-1
2009-08-12 23:00:22 +02:00
Jiri Netolicky
446e126de5 576368 – htmlChunkParser with special attributes
* HTMLparser.c: htmlChunkParsing failed when the chunk ends inside
  element after some attribute which  has a '>' char in its value.
2009-08-07 17:05:36 +02:00
Daniel Veillard
4d3e2da7f8 * HTMLparser.c: make sure we keep line numbers fixes #580705
based Aaron Patterson patch
Daniel
2009-05-15 17:55:45 +02:00
Roland Steiner
04f8eef852 * HTMLparser.c: a broken HTML table attributes initialization,
fixes #581803, by Roland Steiner <rolandsteiner@google.com>
Daniel
2009-05-12 09:16:16 +02:00
Daniel Veillard
7f4547cdbd preparing the release of 2.7.2 fix the Solaris portability issue
* configure.in doc/* NEWS: preparing the release of 2.7.2
* dict.c: fix the Solaris portability issue
* parser.c: additional cleanup on #554660 fix
* test/ent13 result/ent13* result/noent/ent13*: added the
  example in the regression test suite.
* HTMLparser.c: handle leading BOM in htmlParseElement()
Daniel

svn path=/trunk/; revision=3799
2008-10-03 07:58:23 +00:00
Daniel Veillard
a57ba4ce96 fix an HTML parsing error on large data sections reported by Mike Day add
* HTMLparser.c: fix an HTML parsing error on large data sections
  reported by Mike Day
* test/HTML/utf8bug.html result/HTML/utf8bug.html.err
  result/HTML/utf8bug.html.sax result/HTML/utf8bug.html: add the
  reproducer to the test suite
daniel

svn path=/trunk/; revision=3797
2008-09-25 16:06:18 +00:00
Daniel Veillard
4cc67bb77e patch from Robert Schwebel , allows to compile the example if configured
* doc/examples/reader3.c: patch from  Robert Schwebel , allows to
  compile the example if configured without output support fixes
  #545582
* Makefile.am: add testrecurse to the make check tests
* HTMLparser.c: if the parser got a encoding argument it should be
  used over what the meta specifies, patch fixing #536346
Daniel

svn path=/trunk/; revision=3785
2008-08-29 19:58:23 +00:00
Daniel Veillard
ae0765b681 more progresses against the official regression tests small cleanup for
* runxmlconf.c: more progresses against the official regression tests
* runsuite.c: small cleanup for non-leak reports
* include/libxml/tree.h: parsing flags and other properties are
  now added to the document node, this is generally useful and
  allow to make Name and NmToken validations based on the parser
  flags, more specifically the 5th edition of XML or not
* HTMLparser.c tree.c: small side effects for the previous changes
* parser.c SAX2.c valid.c: the bulk of teh changes are here,
  the parser and validation behaviour can be affected, parsing
  flags need to be copied, lot of changes. Also fixing various
  validation problems in the regression tests.
Daniel

svn path=/trunk/; revision=3762
2008-07-31 19:54:59 +00:00
Daniel Veillard
ed86dc2383 applied patch from Ashwin fixing a number of realloc problems improve
* uri.c: applied patch from Ashwin fixing a number of realloc problems
* HTMLparser.c: improve handling for misplaced html/head/body
Daniel

svn path=/trunk/; revision=3740
2008-04-24 11:58:41 +00:00
Daniel Veillard
36de63e71d apparently it's okay to forget the semicolumn after entity refs in HTML,
* HTMLparser.c: apparently it's okay to forget the semicolumn after
  entity refs in HTML, fixing char refs parsing accordingly based on
  T. Manske patch, this should fix #517653
Daniel

svn path=/trunk/; revision=3726
2008-04-03 09:05:05 +00:00
Daniel Veillard
35fcbb84d2 patch from Arnold Hendriks improving parsing of html within html bogus
* HTMLparser.c: patch from Arnold Hendriks improving parsing of
  html within html bogus data, still not a complete fix though
Daniel

svn path=/trunk/; revision=3704
2008-03-12 21:43:39 +00:00
Daniel Veillard
c5b43cc03a avoid stopping parsing when encountering out of range characters in an
* HTMLparser.c: avoid stopping parsing when encountering
  out of range characters in an HTML file, report and 
  continue processing instead, should fix #472696
Daniel

svn path=/trunk/; revision=3675
2008-01-11 07:41:39 +00:00
Daniel Veillard
640f89ef61 fix definition for <embed> to avoid error when saving back, patch from
* HTMLparser.c: fix definition for <embed> to avoid error
  when saving back, patch from Stefan Behnel fixing 495213
Daniel

svn path=/trunk/; revision=3671
2008-01-11 06:24:09 +00:00
Daniel Veillard
861101d1fa fixed bug #381877, avoid reading over the end of stream when generating an
* HTMLparser.c: fixed bug #381877, avoid reading over the end
  of stream when generating an UTF-8 encoding error.
Daniel

svn path=/trunk/; revision=3627
2007-06-12 08:38:57 +00:00
Daniel Veillard
491e58e575 applied patch from Michael Day to add support for <embed> Daniel
* HTMLparser.c: applied patch from Michael Day to add support for <embed>
Daniel

svn path=/trunk/; revision=3611
2007-05-02 16:15:18 +00:00
Daniel Veillard
739e9d0981 Dohh !
Daniel

svn path=/trunk/; revision=3610
2007-04-27 09:33:58 +00:00
Daniel Veillard
4d1320fa5b Jean-Daniel Dupas pointed a couple of problems in htmlCreateDocParserCtxt.
* HTMLparser.c: Jean-Daniel Dupas pointed a couple of problems
  in htmlCreateDocParserCtxt.
Daniel

svn path=/trunk/; revision=3609
2007-04-26 08:55:33 +00:00
Daniel Veillard
42720248e6 change the way script/style are parsed to not try to detect comments,
* HTMLparser.c: change the way script/style are parsed to
  not try to detect comments, reported by Mike Day
* result/HTML/doc3.*: affects the result of that test
Daniel

svn path=/trunk/; revision=3598
2007-04-16 07:02:31 +00:00
William M. Brack
e978ae25ca fixed memory access error on parsing of meta data which had errors (bug
* HTMLparser.c: fixed memory access error on parsing of meta data
  which had errors (bug #382206).  Also cleaned up a few warnings
  by adding some additional DECL macros.

svn path=/trunk/; revision=3593
2007-03-21 06:16:02 +00:00
Daniel Veillard
1032ac4c5c applied patch from Steven Rainwater to fix UTF8ToHtml behaviour on code
* HTMLparser.c: applied patch from Steven Rainwater to fix
  UTF8ToHtml behaviour on code points which are not mappable to
  predefined HTML entities, fixes #377544
Daniel
2006-11-23 16:18:30 +00:00
Daniel Veillard
772869fe10 change htmlCtxtReset() following Michael Day bug report and suggestion.
* HTMLparser.c: change htmlCtxtReset() following Michael Day bug
  report and suggestion.
Daniel
2006-11-08 09:16:56 +00:00
Daniel Veillard
890fd9f9f3 applied a reworked version of Usamah Malik patch to avoid growing the
* HTMLparser.c: applied a reworked version of Usamah Malik patch
  to avoid growing the parser stack in some autoclose cases, should
  fix #361221
Daniel
2006-10-27 12:53:28 +00:00
Daniel Veillard
af616a7386 fix one problem found in htmlCtxtUseOptions() and pointed in #340591
* HTMLparser.c: fix one problem found in htmlCtxtUseOptions()
  and pointed in #340591
Daniel
2006-10-17 20:18:39 +00:00
Daniel Veillard
8a82ae12c3 fixed teh 2 stupid bugs affecting htmlReadDoc() and htmlReadIO() this
* HTMLparser.c: fixed teh 2 stupid bugs affecting htmlReadDoc() and
  htmlReadIO() this should fix #340322
Daniel
2006-10-17 20:04:10 +00:00
Daniel Veillard
c47d263049 fixing HTML minimized attribute values to be generated internally if not
* HTMLparser.c: fixing HTML minimized attribute values to be generated
  internally if not present, fixes bug #332124
* result/HTML/doc2.htm.sax result/HTML/doc3.htm.sax
  result/HTML/wired.html.sax: this affects the SAX event strem for
  a few test cases
Daniel
2006-10-17 16:13:27 +00:00
Daniel Veillard
48519092e5 fixing HTML entities in attributes parsing bug #362552 added to the
* HTMLparser.c: fixing HTML entities in attributes parsing bug #362552
* result/HTML/entities2.html* test/HTML/entities2.html: added to
  the regression suite
Daniel
2006-10-17 15:56:35 +00:00