1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00

7744 Commits

Author SHA1 Message Date
Nick Wellnhofer
4df8d55742 io: Fix stack use after scope
Short-lived regression.
2025-05-12 17:31:14 +02:00
Nick Wellnhofer
f0983199e8 html: Map some encodings according to HTML5
Windows-1252 is a superset of ISO-8859-1 and should be used instead.
Same for ASCII.

Also map UCS-2 and UTF-16 to UTF-16LE.
2025-05-12 14:04:30 +02:00
Nick Wellnhofer
93f671064e encoding: Add HTML5 aliases 2025-05-12 13:27:29 +02:00
Nick Wellnhofer
628006f457 encoding: Add windows-1252
Fixes #915.
2025-05-12 13:27:22 +02:00
Nick Wellnhofer
a7016baea6 tools: Remove unnecessary data from iso8859x.inc 2025-05-12 13:14:21 +02:00
Nick Wellnhofer
c92374f1b8 tools: Recreate script to generate iso8859x.inc
The script to create these tables was never committed to version
control.
2025-05-12 13:14:21 +02:00
Nick Wellnhofer
f602c0c186 html: Rework serialization of meta encoding attributes
Don't allocate memory.
2025-05-12 00:05:02 +02:00
Nick Wellnhofer
7654c2efc0 html: Rework serialization of URIs
Don't allocate memory.
2025-05-12 00:04:00 +02:00
Nick Wellnhofer
bd777e4f42 html: Speed up htmlIsBooleanAttr
This is used when serializing.
2025-05-11 23:28:40 +02:00
Nick Wellnhofer
825f3a9d0c html: Always serialize attributes with double quotes
Align with HTML5.
2025-05-11 21:42:51 +02:00
Nick Wellnhofer
5c4cc456a4 html: Escape encoding in meta tags 2025-05-11 21:30:30 +02:00
Nick Wellnhofer
0674ccb7cb html: Stop omitting end tags when serializing
Align with HTML5.
2025-05-11 20:57:07 +02:00
Nick Wellnhofer
05b8fe0a06 html: Don't escape RAWTEXT and PLAINTEXT
Align with HTML5.
2025-05-11 20:57:07 +02:00
Nick Wellnhofer
809ded586b html: Add more empty elements
Add empty HTML5 elements <bgsound>, <keygen>, <source>, <track> and
<wbr>.

Make <embed> an empty element.
2025-05-11 20:46:50 +02:00
Nick Wellnhofer
5f8ebc8809 save: Avoid xmlOutputBufferWriteQuotedString
xmlOutputBufferWriteQuotedString should be reserved for things like
system IDs.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
0d81d6f811 html: Use xmlOutputBufferWrite if possible 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
89fcfe3a29 html: Start to use xmlSerializeText
Avoid temporary copy to speed up serialization.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
777e2adf77 io: Consolidate escaping code
Use generated table approach of xmlSerializeText for xmlEscapeText.

Move most code to xmlIO.c.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
cdaf657ffb html: Don't escape < and > when serializing attribute values
Align with HTML5.

This will break some test suites.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
e0e0a1f0f5 html: Remove special handling of &{...} when serializing
See https://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1

Align with HTML5.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
dad1163078 entities: Always replace invalid chars when escaping
The previous refactor painstakingly recreated the different behavior of
separate functions that were merged. It makes

Optimize IS_CHAR check for non-ASCII chars.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
c8cea39d8a save: Fix serialization of attribute defaults containing &lt;
Long-standing bug that produced invalid XML.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
971038e59f html: Call lower-level escaping functions
Removes the need to pass a document around.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
63535d3922 tree: Make xmlNodeListGetStringInternal work with escape flags 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
442c1903af doc: Fix some damage from automated conversions
Add some newlines, fix returns.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
98a61c9dff doc: Fix briefs in tree docs 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
4b4bc15acf doc: Misc fixes to buffer docs 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
ad390a5d14 parser: Set doc properties in endDocument SAX handler 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
c7c4964342 html: Move DTD creation to endDocument SAX callback 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
46f05ea4d5 html: Rework meta charset handling
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.

Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.

Add full support for <meta charset=""> which is also used when adding
meta tags.

Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.

Only switch encoding once when parsing.

Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.

Fixes #909.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
9aaa52fe48 tree: Make xmlNodeAddContent work with attributes 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
655ac5f851 html: Add comment regarding hack for XML documents 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
f3a080bc48 html: Ignore U+0000 in body text
Align with HTML5. Fixes #908.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
a1e83b2401 io: Fix negation of potentially unsigned value 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
b3854fe964 reader: Fix null deref on malloc failure
Short-lived regression from 177067ea.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
6684eb9350 fuzz: Fix out-of-tree build 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
6bd380ce1c fuzz: Update README 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
967df734c5 malloc-fail: Handle malloc failure in xmlSchemaCopyValue
Avoid null pointer dereference. Fixes #905.
2025-05-11 20:29:25 +02:00
Pavel Kopylov
4ed7157406 python: fix use-after-free in functions xmlPythonFileReadRaw(), xmlPythonFileRead()
with python2.

Fixes #910.
2025-05-09 11:58:01 +02:00
Nick Wellnhofer
38ea8fa9de doc: Fix varargs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
9bbffec568 doc: Move brief to top, params to bottom of doc comments 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
7bc7ae9db3 doc: Enable Doxygen autobrief 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
ab13fbfd68 doc: Misc fixes to error docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
b1685459a3 doc: Misc fixes to xmlsave docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
7d689fabda doc: Fix doc installation with Autotools 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
7b59e74c5f doc: Always use case sensitive filenames with Doxygen
Avoid platform-specific behavior.
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
298f70b3d7 doc: Misc fixes to HTML tree docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
18d20a68bc doc: More fine-grained redirects for old pages 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
80b6429fb3 doc: Misc fixes to encoding docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
81ac2e27fd doc: Misc fixes to valid docs 2025-05-06 19:51:38 +02:00