Nick Wellnhofer
46f05ea4d5
html: Rework meta charset handling
...
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.
Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.
Add full support for <meta charset=""> which is also used when adding
meta tags.
Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.
Only switch encoding once when parsing.
Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.
Fixes #909 .
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
38ea8fa9de
doc: Fix varargs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
9bbffec568
doc: Move brief to top, params to bottom of doc comments
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
ab13fbfd68
doc: Misc fixes to error docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
b1685459a3
doc: Misc fixes to xmlsave docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
298f70b3d7
doc: Misc fixes to HTML tree docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
80b6429fb3
doc: Misc fixes to encoding docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
81ac2e27fd
doc: Misc fixes to valid docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
714decd6d6
doc: Misc fixes to entities docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
f38f3e7b25
doc: Misc fixes to IO documentation
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
e6cfd04994
doc: Misc fixes to tree docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
1bf44f09ba
doc: Misc fixes to parser docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
b7274fb02f
doc: Misc fixes to HTML parser docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
411f30ef2a
doc: Don't document legacy HTML parser macros
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
4a01087585
doc: Move parser option docs to enum
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
a449c5fde3
catalog: Deprecate some functions
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
075283d49d
xlink: Deprecate remaining public function
...
This was never finished.
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
2c150e62f5
doc: Formatting fixes
2025-05-02 20:21:39 +02:00
Nick Wellnhofer
08a282f9f7
doc: Doxygen fixes for xmlversion.h
2025-05-02 20:12:52 +02:00
Nick Wellnhofer
e78e05c990
doc: Fix autolinks to functions
...
Unfortunately, autolinks in .c files aren't converted by Doxygen for
some reason.
2025-05-02 17:45:31 +02:00
Nick Wellnhofer
f7c412874b
doc: Remove more comment block headers
2025-05-02 17:41:26 +02:00
Nick Wellnhofer
0ffa7dd8b1
include: Add hyperlink to deprecation warnings
...
Doxygen creates a nice "deprecated list" for us.
2025-05-02 14:52:03 +02:00
Nick Wellnhofer
1eca6e3476
parser: Deprecate xmlClearParserCtxt
2025-05-02 13:33:35 +02:00
Nick Wellnhofer
e525564f65
doc: Remove empty lines at start of block
...
These lines were left over after automatic conversion.
2025-05-02 11:42:05 +02:00
Nick Wellnhofer
fd6ab89be3
doc: Adjust documentation of public structs
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
8816f267be
doc: Adjust documentation of enums
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
e549622bc5
doc: Convert documentation to Doxygen
...
Automated conversion based on a few regexes.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
69879da88f
doc: Remove email addresses from documentation
...
Also remove authorship information from generated files, hash.c and
globals.c which were rewritten.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
61890e399d
doc: Prepare for conversion to Doxygen
...
Fix many params in internal functions (not really necessary but Doxygen
warns about that in XML mode).
Fix formatting in a few corner cases that automatic conversion can't
handle.
Rearrange some DOC_DISABLE blocks.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
87b30343f6
io: Fix linkage of __xml*BufferCreateFilename functions
...
Make these functions usable on Windows.
2025-04-29 20:36:25 +02:00
Nick Wellnhofer
fc8899d47c
parser: Make xmlCtxtGetValidCtxt depend on VALID_ENABLED
2025-04-27 13:01:42 +02:00
Nick Wellnhofer
b85d77d156
http: Remove built-in HTTP client
...
Stubs are retained for ABI compatibility.
Fixes #631 .
Obsoletes #160 .
2025-04-20 18:21:06 +02:00
Nick Wellnhofer
4ba1f9238a
html: Avoid HTML_PARSE_HTML5 clashing with XML_PARSE_NOENT
...
There are several users that pass invalid XML parser options to the
HTML parser. Choose a value that is less likely to clash.
2025-04-18 18:48:25 +02:00
Nick Wellnhofer
aa4ef7737b
parser: Deprecate output-related globals
2025-04-17 21:14:00 +02:00
Nick Wellnhofer
fc4adba90e
error: Fix initGenericErrorDefaultFunc compatibility macro
2025-04-12 16:26:07 +02:00
Nick Wellnhofer
97ffa77d6d
encoding: Deprecate non-thread-safe functions
2025-04-10 17:36:58 +02:00
Nick Wellnhofer
2ecc08f6dc
html: Deprecate more functions
2025-04-10 16:36:03 +02:00
Nick Wellnhofer
b349225952
include: Change some return types from int to enum
...
This also affects some new functions from 2.13.
2025-03-14 02:31:01 +01:00
Nick Wellnhofer
fd1b939168
include: Convert some macros to enums
2025-03-14 00:35:40 +01:00
Nick Wellnhofer
84c6524e26
encoding: Support input-only and output-only converters
...
Make it possible to open an encoding handler only for input or output.
This avoids the creation of unnecessary converters.
Should also fix #863 .
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
69b83bb68e
encoding: Detect truncated multi-byte sequences with ICU
...
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.
It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.
Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
03a8f1dd75
doc: Document SAX handlers a little more
2025-03-11 18:53:59 +01:00
Nick Wellnhofer
87c9e000e5
encoding: Rework custom encoding implementation API
2025-03-09 22:37:13 +01:00
Nick Wellnhofer
ba9148d8a5
parser: Undeprecate input->consumed
...
Should be deprecated after fixing #762 .
2025-03-09 20:30:49 +01:00
Nick Wellnhofer
a0dbf030ee
parser: Undeprecate ctxt->loadsubset
...
Should be deprecated after fixing #873 .
2025-03-09 20:24:06 +01:00
Nick Wellnhofer
d96911f100
doc: Documentation fixes
2025-03-08 23:03:26 +01:00
Nick Wellnhofer
5f0b1378d7
parser: Add more parser context accessors
...
Fixes #763 .
2025-03-08 22:36:06 +01:00
Nick Wellnhofer
38f475072a
encoding: Make conversion callbacks more type-safe
2025-03-05 22:25:14 +01:00
Nick Wellnhofer
a846d96468
encoding: Remove compatibility struct members
2025-03-05 16:49:42 +01:00
Nick Wellnhofer
94d8a3e231
parser: Convert xmlParserMaxDepth to macro
2025-03-05 14:56:46 +01:00