mirror of
https://gitlab.gnome.org/GNOME/libxml2.git
synced 2025-10-27 12:15:34 +03:00
For https://bugzilla.gnome.org/show_bug.cgi?id=655218 http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#the-meta-element """ The charset attribute specifies the character encoding used by the document. This is a character encoding declaration. If the attribute is present in an XML document, its value must be an ASCII case-insensitive match for the string "UTF-8" (and the document is therefore forced to use UTF-8 as its encoding). """ However, while <meta http-equiv="Content-Type" content="text/html; charset=utf8"> works, <meta charset="utf8"> does not. While libxml2 HTML parser is not tuned for HTML5, this is a simple addition Also added a testcase
31 lines
554 B
Plaintext
31 lines
554 B
Plaintext
SAX.setDocumentLocator()
|
|
SAX.startDocument()
|
|
SAX.startElement(html)
|
|
SAX.ignorableWhitespace(
|
|
, 1)
|
|
SAX.startElement(head)
|
|
SAX.ignorableWhitespace(
|
|
, 1)
|
|
SAX.startElement(meta, charset='iso-8859-1')
|
|
SAX.endElement(meta)
|
|
SAX.ignorableWhitespace(
|
|
, 1)
|
|
SAX.endElement(head)
|
|
SAX.ignorableWhitespace(
|
|
, 1)
|
|
SAX.startElement(body)
|
|
SAX.characters(
|
|
, 3)
|
|
SAX.startElement(p)
|
|
SAX.characters(très, 5)
|
|
SAX.endElement(p)
|
|
SAX.characters(
|
|
, 1)
|
|
SAX.endElement(body)
|
|
SAX.ignorableWhitespace(
|
|
, 1)
|
|
SAX.endElement(html)
|
|
SAX.ignorableWhitespace(
|
|
, 1)
|
|
SAX.endDocument()
|