Nick Wellnhofer
4df8d55742
io: Fix stack use after scope
...
Short-lived regression.
2025-05-12 17:31:14 +02:00
Nick Wellnhofer
f0983199e8
html: Map some encodings according to HTML5
...
Windows-1252 is a superset of ISO-8859-1 and should be used instead.
Same for ASCII.
Also map UCS-2 and UTF-16 to UTF-16LE.
2025-05-12 14:04:30 +02:00
Nick Wellnhofer
93f671064e
encoding: Add HTML5 aliases
2025-05-12 13:27:29 +02:00
Nick Wellnhofer
628006f457
encoding: Add windows-1252
...
Fixes #915 .
2025-05-12 13:27:22 +02:00
Nick Wellnhofer
a7016baea6
tools: Remove unnecessary data from iso8859x.inc
2025-05-12 13:14:21 +02:00
Nick Wellnhofer
c92374f1b8
tools: Recreate script to generate iso8859x.inc
...
The script to create these tables was never committed to version
control.
2025-05-12 13:14:21 +02:00
Nick Wellnhofer
f602c0c186
html: Rework serialization of meta encoding attributes
...
Don't allocate memory.
2025-05-12 00:05:02 +02:00
Nick Wellnhofer
7654c2efc0
html: Rework serialization of URIs
...
Don't allocate memory.
2025-05-12 00:04:00 +02:00
Nick Wellnhofer
bd777e4f42
html: Speed up htmlIsBooleanAttr
...
This is used when serializing.
2025-05-11 23:28:40 +02:00
Nick Wellnhofer
825f3a9d0c
html: Always serialize attributes with double quotes
...
Align with HTML5.
2025-05-11 21:42:51 +02:00
Nick Wellnhofer
5c4cc456a4
html: Escape encoding in meta tags
2025-05-11 21:30:30 +02:00
Nick Wellnhofer
0674ccb7cb
html: Stop omitting end tags when serializing
...
Align with HTML5.
2025-05-11 20:57:07 +02:00
Nick Wellnhofer
05b8fe0a06
html: Don't escape RAWTEXT and PLAINTEXT
...
Align with HTML5.
2025-05-11 20:57:07 +02:00
Nick Wellnhofer
809ded586b
html: Add more empty elements
...
Add empty HTML5 elements <bgsound>, <keygen>, <source>, <track> and
<wbr>.
Make <embed> an empty element.
2025-05-11 20:46:50 +02:00
Nick Wellnhofer
5f8ebc8809
save: Avoid xmlOutputBufferWriteQuotedString
...
xmlOutputBufferWriteQuotedString should be reserved for things like
system IDs.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
0d81d6f811
html: Use xmlOutputBufferWrite if possible
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
89fcfe3a29
html: Start to use xmlSerializeText
...
Avoid temporary copy to speed up serialization.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
777e2adf77
io: Consolidate escaping code
...
Use generated table approach of xmlSerializeText for xmlEscapeText.
Move most code to xmlIO.c.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
cdaf657ffb
html: Don't escape < and > when serializing attribute values
...
Align with HTML5.
This will break some test suites.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
e0e0a1f0f5
html: Remove special handling of &{...} when serializing
...
See https://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
Align with HTML5.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
dad1163078
entities: Always replace invalid chars when escaping
...
The previous refactor painstakingly recreated the different behavior of
separate functions that were merged. It makes
Optimize IS_CHAR check for non-ASCII chars.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
c8cea39d8a
save: Fix serialization of attribute defaults containing <
...
Long-standing bug that produced invalid XML.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
971038e59f
html: Call lower-level escaping functions
...
Removes the need to pass a document around.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
63535d3922
tree: Make xmlNodeListGetStringInternal work with escape flags
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
442c1903af
doc: Fix some damage from automated conversions
...
Add some newlines, fix returns.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
98a61c9dff
doc: Fix briefs in tree docs
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
4b4bc15acf
doc: Misc fixes to buffer docs
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
ad390a5d14
parser: Set doc properties in endDocument SAX handler
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
c7c4964342
html: Move DTD creation to endDocument SAX callback
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
46f05ea4d5
html: Rework meta charset handling
...
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.
Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.
Add full support for <meta charset=""> which is also used when adding
meta tags.
Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.
Only switch encoding once when parsing.
Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.
Fixes #909 .
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
9aaa52fe48
tree: Make xmlNodeAddContent work with attributes
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
655ac5f851
html: Add comment regarding hack for XML documents
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
f3a080bc48
html: Ignore U+0000 in body text
...
Align with HTML5. Fixes #908 .
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
a1e83b2401
io: Fix negation of potentially unsigned value
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
b3854fe964
reader: Fix null deref on malloc failure
...
Short-lived regression from 177067ea .
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
6684eb9350
fuzz: Fix out-of-tree build
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
6bd380ce1c
fuzz: Update README
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
967df734c5
malloc-fail: Handle malloc failure in xmlSchemaCopyValue
...
Avoid null pointer dereference. Fixes #905 .
2025-05-11 20:29:25 +02:00
Pavel Kopylov
4ed7157406
python: fix use-after-free in functions xmlPythonFileReadRaw(), xmlPythonFileRead()
...
with python2.
Fixes #910 .
2025-05-09 11:58:01 +02:00
Nick Wellnhofer
38ea8fa9de
doc: Fix varargs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
9bbffec568
doc: Move brief to top, params to bottom of doc comments
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
7bc7ae9db3
doc: Enable Doxygen autobrief
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
ab13fbfd68
doc: Misc fixes to error docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
b1685459a3
doc: Misc fixes to xmlsave docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
7d689fabda
doc: Fix doc installation with Autotools
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
7b59e74c5f
doc: Always use case sensitive filenames with Doxygen
...
Avoid platform-specific behavior.
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
298f70b3d7
doc: Misc fixes to HTML tree docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
18d20a68bc
doc: More fine-grained redirects for old pages
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
80b6429fb3
doc: Misc fixes to encoding docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
81ac2e27fd
doc: Misc fixes to valid docs
2025-05-06 19:51:38 +02:00