1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00
Commit Graph

118 Commits

Author SHA1 Message Date
Nick Wellnhofer
c1ba6f54d3 Revert "Do not URI escape in server side includes"
This reverts commit 960f0e2756.

This commit introduced

- an infinite loop, found by OSS-Fuzz, which could be easily fixed.
- an algorithm with quadratic runtime
- a security issue, see
  https://bugzilla.gnome.org/show_bug.cgi?id=769760

A better approach is to add an option not to escape URLs at all
which libxml2 should have possibly done in the first place.
2020-08-15 18:32:29 +02:00
Nick Wellnhofer
b79ab6e6d9 Make htmlNodeDumpFormatOutput non-recursive
Fixes stack overflow with deeply nested HTML documents.

Found by OSS-Fuzz.
2020-07-28 03:44:30 +02:00
Nick Wellnhofer
20c60886e4 Fix typos
Resolves #133.
2020-03-08 17:41:53 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
d459831c1b Fix HTML serialization with UTF-8 encoding
If the encoding is specified as UTF-8, make sure to use a NULL encoding
handler.
2018-10-13 16:47:13 +02:00
Nick Wellnhofer
ee501f5449 Stop using doc->charset outside parser code
doc->charset does not specify the in-memory encoding which is always
UTF-8.
2018-10-13 16:47:01 +02:00
Shaun McCance
7607d9dd45 Allow HTML serializer to output HTML5 DOCTYPE
For https://bugzilla.gnome.org/show_bug.cgi?id=747301

Use simple HTML5 DOCTYPE for about:legacy-compat

HTML5 uses a DOCTYPE without a PUBLIC or SYSTEM identifier. It looks
like this:

<!DOCTYPE html>

I can't use XSLT to output this, because to get a DOCTYPE I have to
provide a PUBLIC or SYSTEM identifier. Luckily, the standards folks
recognized this and provided this semantically equivalent form for the
HTML DOCTYPE:

<!DOCTYPE html SYSTEM "about:legacy-compat">

But people don't like seeing the "legacy" identifier in their output.
They'd rather see the shiny new DOCTYPE. Since we know that
about:legacy-compat is defined by the W3C to be semantically equivalent
to the sans-SYSTEM DOCTYPE, we could just special-case it in the HTML
serializer in libxml2. So if you set the SYSTEM identifier to
"about:legacy-compat", you get an HTML5 short-form DOCTYPE.
2015-04-03 22:52:36 +08:00
Romain Bondue
960f0e2756 Do not URI escape in server side includes 2013-04-23 20:44:55 +08:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
7d4c529a33 Improve HTML escaping of attribute on output
Handle special cases of &{...} constructs as hinted in the spec
  http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
2012-09-05 12:11:43 +08:00
Daniel Veillard
7b9b07198f Convert the HTML tree module to the new buffers
The new input buffers induced a couple of changes, the others
are related to the switch to xmlBuf in saving routines.
2012-07-23 14:24:27 +08:00
Daniel Veillard
39d027cdb7 Fix html serialization error and htmlSetMetaEncoding()
For https://bugzilla.gnome.org/show_bug.cgi?id=630682
The python tests were reporting errors, some of it was due to
a small change in case encoding, but the main one was about
htmlSetMetaEncoding(doc, NULL) being broken by not removing
the associated meta tag anymore
2012-05-11 12:38:23 +08:00
Daniel Veillard
c62efc847c Add options to ignore the internal encoding
For both XML and HTML, the document can provide an encoding
either in XMLDecl in XML, or as a meta element in HTML head.
This adds options to ignore those encodings if the encoding
is known in advace for example if the content had been converted
before being passed to the parser.

* parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option
  for XML parsing
* include/libxml/HTMLparser.h HTMLparser.c: adds the
  HTML_PARSE_IGNORE_ENC for HTML parsing
* HTMLtree.c: fix the handling of saving when an unknown encoding is
  defined in meta document header
* xmllint.c: add a --noenc option to activate the new parser options
2011-05-26 11:47:37 +08:00
Daniel Veillard
8d7c1b7ab2 582913 Fix htmlSetMetaEncoding() to be nicer
* HTMLtree.c: htmlSetMetaEncoding should not destroy existing meta
  encoding elements, plus it should not change things at all if the
  encoding is the same. Also fixed htmlSaveFileFormat() to ask for
  change if outputing to UTF-8.
2009-08-12 23:03:23 +02:00
Daniel Veillard
74eb54b5b7 575875 don't output charset=html
* HTMLtree.c: don't output charset=html in htmlSetMetaEncoding()
  as this is clearly a libxml2 only thingused for import only
2009-08-12 15:59:01 +02:00
Daniel Veillard
da3fee406d Borland C fix from Moritz Both regenerate, workaround a problem for buffer
* trionan.c: Borland C fix from Moritz Both
* testapi.c: regenerate, workaround a problem for buffer testing
* xmlIO.c HTMLtree.c: new internal entry point to hide even better
  xmlAllocOutputBufferInternal
* tree.c: harden the code around buffer allocation schemes
* parser.c: restore the warning when namespace names are not absolute
  URIs
* runxmlconf.c: continue regression tests if we get the expected
  number of errors
* Makefile.am: run the python tests on make check
* xmlsave.c: handle the HTML documents and trees
* python/libxml.c: convert python serialization to the xmlSave APIs
  and avoid some horrible hacks
Daniel

svn path=/trunk/; revision=3790
2008-09-01 13:08:57 +00:00
Daniel Veillard
fcd02adb71 htmlNodeDumpFormatOutput didn't handle XML_ATTRIBUTE_NODe fixes bug
* HTMLtree.c: htmlNodeDumpFormatOutput didn't handle XML_ATTRIBUTE_NODe
  fixes bug #438390
Daniel

svn path=/trunk/; revision=3631
2007-06-12 09:49:40 +00:00
Rob Richards
417b74d0b1 Add linefeeds to error messages allowing for consistant handling.
* HTMLtree.c xmlsave.c: Add linefeeds to error messages allowing
  for consistant handling.
2006-08-15 23:14:24 +00:00
Rob Richards
77b92ff6a8 fix bug #322136 in xmlNodeBufGetContent when entity ref is a child of an
* tree.c: fix bug #322136 in xmlNodeBufGetContent when entity ref is
  a child of an element (fix by Oleksandr Kononenko).
* HTMLtree.c include/libxml/HTMLtree.h: Add htmlDocDumpMemoryFormat.
2005-12-20 15:55:14 +00:00
Daniel Veillard
b8c8016044 fixed bug #310333 with a patch close to the provided patch for HTML UTF-8
* HTMLtree.c: fixed bug #310333 with a patch close to the provided
  patch for HTML UTF-8 serialization
* result/HTML/script2.html: this changed the output of that test
Daniel
2005-08-08 13:46:45 +00:00
Daniel Veillard
5d4644ef6e revamped the elfgcchack.h format to cope with gcc4 change of aliasing
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
  format to cope with gcc4 change of aliasing allowed scopes, had
  to add extra informations to doc/libxml2-api.xml to separate
  the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
  and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
2005-04-01 13:11:58 +00:00
Daniel Veillard
aa9a983dbd fixing bug 168196, <a name=""> must be URI escaped too Daniel
* HTMLtree.c: fixing bug 168196, <a name=""> must be URI escaped too
Daniel
2005-03-29 20:30:17 +00:00
Daniel Veillard
d5cc0f7f51 augmented types supported a number of new bug fixes and documentation
* gentest.py testapi.c: augmented types supported
* HTMLtree.c tree.c xmlreader.c xmlwriter.c: a number of new
  bug fixes and documentation updates.
Daniel
2004-11-06 19:24:28 +00:00
Daniel Veillard
ce244ad595 fixed the way the generator works, extended the testing, especially with
* gentest.py testapi.c: fixed the way the generator works,
  extended the testing, especially with more real trees and nodes.
* HTMLtree.c tree.c valid.c xinclude.c xmlIO.c xmlsave.c: a bunch
  of real problems found and fixed.
* entities.c: fix error reporting to go through the new handlers
Daniel
2004-11-05 10:03:46 +00:00
Daniel Veillard
3d97e669ec extending the tests coverage more fixes and cleanups Daniel
* gentest.py testapi.c: extending the tests coverage
* HTMLtree.c tree.c xmlsave.c xpointer.c: more fixes and cleanups
Daniel
2004-11-04 10:49:00 +00:00
Daniel Veillard
36e5cd5064 adding xmlMemBlocks() work on generator of an automatic API regression
* xmlmemory.c include/libxml/xmlmemory.h: adding xmlMemBlocks()
* Makefile.am gentest.py testapi.c: work on generator of an
  automatic API regression test tool.
* SAX2.c nanoftp.c parser.c parserInternals.c tree.c xmlIO.c
  xmlstring.c: various API hardeing changes as a result of running
  teh first set of automatic API regression tests.
* test/slashdot16.xml: apparently missing from CVS, commited it
Daniel
2004-11-02 14:52:23 +00:00
William M. Brack
13dfa87e91 added the routine xmlNanoHTTPContentLength to the external API
* nanohttp.c, include/libxml/nanohttp.h: added the routine
  xmlNanoHTTPContentLength to the external API (bug151968).
* parser.c: fixed unnecessary internal error message (bug152060);
  also changed call to strncmp over to xmlStrncmp.
* encoding.c: fixed compilation warning (bug152307).
* tree.c: fixed segfault in xmlCopyPropList (bug152368); fixed
  a couple of compilation warnings.
* HTMLtree.c, debugXML.c, xmlmemory.c: fixed a few compilation
  warnings; no change to logic.
2004-09-18 04:52:08 +00:00
Daniel Veillard
42fd412637 change --html to make sure we use the HTML serialization rule by default
* xmllint.c: change --html to make sure we use the HTML serialization
  rule by default when HTML parser is used, add --xmlout to allow to
  force the XML serializer on HTML.
* HTMLtree.c: ugly tweak to fix the output on <p> element and
  solve #125093
* result/HTML/*: this changes the output of some tests
Daniel
2003-11-04 08:47:48 +00:00
William M. Brack
76e95df055 Changed all (?) occurences where validation macros (IS_xxx) had
* include/libxml/parserInternals.h HTMLparser.c HTMLtree.c
  SAX2.c catalog.c debugXML.c entities.c parser.c relaxng.c
  testSAX.c tree.c valid.c xmlschemas.c xmlschemastypes.c
  xpath.c: Changed all (?) occurences where validation macros
  (IS_xxx) had single-byte arguments to use IS_xxx_CH instead
  (e.g. IS_BLANK changed to IS_BLANK_CH).  This gets rid of
  many warning messages on certain platforms, and also high-
  lights places in the library which may need to be enhanced
  for proper UTF8 handling.
2003-10-18 16:20:14 +00:00
Daniel Veillard
e2238d5617 converted too small cleanup Daniel
* HTMLtree.c include/libxml/xmlerror.h: converted too
* tree.c: small cleanup
Daniel
2003-10-09 13:14:55 +00:00
Daniel Veillard
a9cce9cd0d Okay this is scary but it is just adding a configure option to disable
* HTMLtree.c SAX2.c c14n.c catalog.c configure.in debugXML.c
  encoding.c entities.c nanoftp.c nanohttp.c parser.c relaxng.c
  testAutomata.c testC14N.c testHTML.c testRegexp.c testRelax.c
  testSchemas.c testXPath.c threads.c tree.c valid.c xmlIO.c
  xmlcatalog.c xmllint.c xmlmemory.c xmlreader.c xmlschemas.c
  example/gjobread.c include/libxml/HTMLtree.h include/libxml/c14n.h
  include/libxml/catalog.h include/libxml/debugXML.h
  include/libxml/entities.h include/libxml/nanohttp.h
  include/libxml/relaxng.h include/libxml/tree.h
  include/libxml/valid.h include/libxml/xmlIO.h
  include/libxml/xmlschemas.h include/libxml/xmlversion.h.in
  include/libxml/xpathInternals.h python/libxml.c:
  Okay this is scary but it is just adding a configure option
  to disable output, this touches most of the files.
Daniel
2003-09-29 13:20:24 +00:00
William M. Brack
3a6da760c5 Fixed bug 121394 - missing ns on attributes
* HTMLtree.c: Fixed bug 121394 - missing ns on attributes
2003-09-15 04:58:14 +00:00
Daniel Veillard
70bcb0ea24 hum try to avoid some troubles when the library is not initialized and one
* HTMLtree.c tree.c threads.c: hum try to avoid some troubles
  when the library is not initialized and one try to save, the
  locks in threaded env might not been initialized, playing safe
* xmlschemastypes.c: apply patch for hexBinary from Charles Bozeman
* test/schemas/hexbinary_* result/schemas/hexbinary_*: also added
  his tests to the regression suite.
Daniel
2003-08-08 14:00:28 +00:00
Daniel Veillard
5f5b7bb78e fixing bug #112904: html output method escaped plus sign character in URI
* HTMLtree.c: fixing  bug #112904: html output method escaped
  plus sign character in URI attribute.
Daniel
2003-05-16 17:19:40 +00:00
Daniel Veillard
645c690d49 patch from Vasily Tchekalkin to fix #109865 Daniel
* HTMLtree.c: patch from Vasily Tchekalkin to fix #109865
Daniel
2003-04-10 21:40:49 +00:00
Daniel Veillard
c7e9b194e7 Fixed reopening of #78662 <form action="..."> is an URI reference Daniel
* HTMLtree.c: Fixed reopening of #78662 <form action="...">
  is an URI reference
Daniel
2003-03-27 14:08:24 +00:00
Daniel Veillard
04ee2f2d00 avoid escaping ',' in URIs Daniel
* HTMLtree.c: avoid escaping ',' in URIs
Daniel
2003-03-23 20:31:46 +00:00
Daniel Veillard
5ecaf7f9a7 fixes #102920 about namespace handling in HTML output and section 16.2
* HTMLtree.c tree.c: fixes #102920 about namespace handling in
  HTML output and section 16.2 "HTML Output Method" of XSLT-1.0
* README: fixed a link
Daniel
2003-01-09 13:19:33 +00:00
Daniel Veillard
024b57019f patch from Mark Vadok about htmlNodeDumpOutput location. removed an
* HTMLtree.c include/libxml/HTMLtree.h: patch from Mark Vadok
  about htmlNodeDumpOutput location.
* xpath.c: removed an undefined function signature
* doc/apibuild.py doc/libxml2-api.xml: the script was exporting
  too many symbols in the API breaking the python bindings.
  Updated with the libxslt/libexslt changes.
Daniel
2002-12-12 00:15:55 +00:00
Daniel Veillard
8db67d2704 applied the same kind of refactoring to the HTML saving code. slight API
* HTMLtree.c include/libxml/HTMLtree.h: applied the same kind
  of refactoring to the HTML saving code.
* doc/libxml2-*.xml doc/API*.html: slight API changes got reflected
  in the doc.
Daniel
2002-11-27 19:39:27 +00:00
Daniel Veillard
44892f73dd fixed serialization of script and style when they are not lowercase (i.e.
* HTMLtree.c: fixed serialization of script and style when
  they are not lowercase (i.e. added using the API to the tree).
Daniel
2002-10-16 15:23:26 +00:00
Daniel Veillard
abe0174442 fixing bug #94241 on HTML boolean attributes Daniel
* HTMLtree.c: fixing bug #94241 on HTML boolean attributes
Daniel
2002-09-26 12:40:03 +00:00
Daniel Veillard
ad11b301ab small cleanup of the man page fixed a potential problem raised by Petr
* libxml.3: small cleanup of the man page
* HTMLtree.c: fixed a potential problem raised by Petr Vandrovec
  when serializing HREF attributes generated by XSLT.
Daniel
2002-08-12 14:53:41 +00:00
Daniel Veillard
c084e47841 integrated a cleaned up version of Marc Liyanage' patch for boolean
* HTMLtree.c include/libxml/HTMLtree.h: integrated a cleaned up
  version of Marc Liyanage' patch for boolean attributes in HTML
  output
Daniel
2002-08-12 13:27:28 +00:00
Daniel Veillard
0b22defa31 trying to fix the <style> escaping problem in HTML serialization bug
* HTMLtree.c: trying to fix the <style> escaping problem in
  HTML serialization bug #89342
Daniel
2002-07-29 16:23:03 +00:00
Daniel Veillard
3a42f3fe30 changed the order of the encoding declaration attributes in the meta tags
* HTMLtree.c: changed the order of the encoding declaration
  attributes in the meta tags due to a bug in IE/Mac
Daniel
2002-07-17 17:57:34 +00:00
Daniel Veillard
6231e84559 fixed & serialization bug introduced in 2.4.20 this changes a few things
* HTMLtree.c: fixed & serialization bug introduced in 2.4.20
* result/HTML/*: this changes a few things in the results
Daniel
2002-04-18 11:54:04 +00:00
Daniel Veillard
eb475a37df fixing bug #78662 i.e. add proper escaping of URI when saving HTML files.
* HTMLtree.c uri.c: fixing bug #78662 i.e. add proper
  escaping of URI when saving HTML files.
* result/HTML/*: this impacted some tests
Daniel
2002-04-14 22:00:22 +00:00
Daniel Veillard
34ce8bece2 preparing 2.4.18 updated and rebuilt the web site implement the new
* configure.in: preparing 2.4.18
* doc/*: updated and rebuilt the web site
* *.c libxml.h: implement the new IN_LIBXML scheme discussed with
  the Windows and Cygwin maintainers.
* parser.c: humm, changed the way the SAX parser work when
  xmlSubstituteEntitiesDefault(1) is set, it will then
  do the entity registration and loading by itself in case the
  user provided SAX getEntity() returns NULL.
* testSAX.c: added --noent to test the behaviour.
Daniel
2002-03-18 19:37:11 +00:00
Daniel Veillard
9ff8817e67 Fixing #74186, made sure all boolean expressions get fully parenthesized,
* c14n.c: Fixing #74186, made sure all boolean expressions
  get fully parenthesized, ran indent on the output
* configure.in HTMLtree.c SAX.c c14n.c debugXML.c tree.c xpointer.c
  include/libxml/tree.h: also #74186 related, removed the
  --with-buffers option, and all the preprocessor conditional
  sections that were resulting from it.
Daniel
2002-03-11 09:15:32 +00:00