1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00
Commit Graph

13 Commits

Author SHA1 Message Date
Nick Wellnhofer
46f05ea4d5 html: Rework meta charset handling
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.

Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.

Add full support for <meta charset=""> which is also used when adding
meta tags.

Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.

Only switch encoding once when parsing.

Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.

Fixes #909.
2025-05-11 20:29:25 +02:00
Daniel Veillard
f933c89813 Keep non-significant blanks node in HTML parser
For https://bugzilla.gnome.org/show_bug.cgi?id=681822

Regardless if the option HTML_PARSE_NOBLANKS is set or not, blank nodes
are removed from a HTML document, for example:

<html>
  <head>
    <title>This is a test.</title>
  </head>
  <body>
    <p>This is a test.</p>
  </body>
</html>

is read as:

<html><head><title>This is a test.</title></head><body>
    <p>This is a test.</p>
  </body></html>

This changes the default behaviour but the old behaviour is available
as expected when using the parser flag HTML_PARSE_NOBLANKS

Based on original patch from Igor Ignatyuk <igor_ignatiouk@hotmail.com>

* HTMLparser.c: change various places in the parser where ignorable_space
  SAX callback was called without checking for the parser flag preference
* xmllint.c: make sure we use the new flag even for HTML parsing
* result/HTML/*: this modifies the output of a number of tests
2012-09-07 19:32:12 +08:00
Daniel Veillard
36d73403ff Applied the last patch from Gary Coady for #304637 changing the behaviour
* HTMLparser.c: Applied the last patch from Gary Coady for #304637
  changing the behaviour when text nodes are found in body
* result/HTML/*: this changes the output of some tests
Daniel
2005-09-01 09:52:30 +00:00
Daniel Veillard
42fd412637 change --html to make sure we use the HTML serialization rule by default
* xmllint.c: change --html to make sure we use the HTML serialization
  rule by default when HTML parser is used, add --xmlout to allow to
  force the XML serializer on HTML.
* HTMLtree.c: ugly tweak to fix the output on <p> element and
  solve #125093
* result/HTML/*: this changes the output of some tests
Daniel
2003-11-04 08:47:48 +00:00
Daniel Veillard
8c9872ca2e trying to fix 87235 about discarded white spaces in the HTML parser. this
* HTMLparser.c: trying to fix 87235 about discarded white
  spaces in the HTML parser.
* result/HTML/*: this changes the output of a number of HTML
  regression tests
Daniel
2002-07-05 18:17:10 +00:00
Daniel Veillard
02bb170a8b - HTMLparser.[ch] HTMLtree.c: stored the inline/block property
of element and use it to avoid outputting formatting spaces at
  the wrong place. Implemented the format parameter for HTML save.
- result/HTML/doc2.htm result/HTML/doc3.htm result/HTML/fp40.htm
  result/HTML/script.html result/HTML/test2.html result/HTML/test3.html
  result/HTML/wired.html: of course this impact the result of a
  number of HTML tests
Daniel
2001-06-13 21:11:59 +00:00
Daniel Veillard
32bc74ef98 - doc/encoding.html doc/xml.html: added I18N doc
- encoding.[ch] HTMLtree.[ch] parser.c HTMLparser.c: I18N encoding
  improvements, both parser and filters, added ASCII & HTML,
  fixed the ISO-Latin-1 one
- xmllint.c testHTML.c: added/made visible --encode
- debugXML.c : cleanup
- most .c files: applied patches due to warning on Windows and
  when using Sun Pro cc compiler
- xpath.c : cleanup memleaks
- nanoftp.c : added a TESTING preprocessor flag for standalong
  compile so that people can report bugs more easilly
- nanohttp.c : ditched socklen_t which was a portability mess
  and replaced it with unsigned int.
- tree.[ch]: added xmlHasProp()
- TODO: updated
- test/ : added more test for entities, NS, encoding, HTML, wap
- configure.in: preparing for 2.2.0 release
Daniel
2000-07-14 14:49:25 +00:00
Daniel Veillard
663a607aa4 Fixing one test suite result, Daniel. 2000-07-01 09:08:24 +00:00
Daniel Veillard
71b656e067 - added xmlRemoveID() and xmlRemoveRef()
- added check and handling when possibly removing an ID
- fixed some entities problems
- added xmlParseTryOrFinish()
- changed the way struct aredeclared to allow gtk-doc to expose those
- closed #4960
- fixes to libs detection from Albert Chin-A-Young
- preparing 1.8.3 release
Daniel
2000-01-05 14:46:17 +00:00
Daniel Veillard
4a53eca27c - Updated HTML test outputs
- Fixed taht f....g problem with C++ and includes,
Daniel
1999-12-12 13:03:50 +00:00
Daniel Veillard
af78a0e1b9 Large commit of changes done while travelling to XML'99
- cleanups on memory use and parsers
- start of Link interfaces HTML and XLink
- rebuild the doc
- released as 1.8.0
Daniel
1999-12-12 13:03:50 +00:00
Daniel Veillard
7c1206fc06 Revamped HTML parsing, lots of bug fixes for HTML stuff,
Added xmlValidGetValidElements and xmlValidGetPotentialChildren,
Completed and cleaned up the tests,
Added doc for new modules gnome-xml-xmlmemory.html and gnome-xml-nanohttp.html,
Daniel
1999-10-14 09:10:25 +00:00
Daniel Veillard
424af39124 Added and updated all the results for 1.5.0, Daniel 1999-08-10 19:10:03 +00:00