1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00
Commit Graph

1352 Commits

Author SHA1 Message Date
Nick Wellnhofer
777e2adf77 io: Consolidate escaping code
Use generated table approach of xmlSerializeText for xmlEscapeText.

Move most code to xmlIO.c.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
dad1163078 entities: Always replace invalid chars when escaping
The previous refactor painstakingly recreated the different behavior of
separate functions that were merged. It makes

Optimize IS_CHAR check for non-ASCII chars.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
971038e59f html: Call lower-level escaping functions
Removes the need to pass a document around.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
63535d3922 tree: Make xmlNodeListGetStringInternal work with escape flags 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
442c1903af doc: Fix some damage from automated conversions
Add some newlines, fix returns.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
98a61c9dff doc: Fix briefs in tree docs 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
46f05ea4d5 html: Rework meta charset handling
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.

Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.

Add full support for <meta charset=""> which is also used when adding
meta tags.

Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.

Only switch encoding once when parsing.

Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.

Fixes #909.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
38ea8fa9de doc: Fix varargs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
9bbffec568 doc: Move brief to top, params to bottom of doc comments 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
ab13fbfd68 doc: Misc fixes to error docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
b1685459a3 doc: Misc fixes to xmlsave docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
298f70b3d7 doc: Misc fixes to HTML tree docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
80b6429fb3 doc: Misc fixes to encoding docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
81ac2e27fd doc: Misc fixes to valid docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
714decd6d6 doc: Misc fixes to entities docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
f38f3e7b25 doc: Misc fixes to IO documentation 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
e6cfd04994 doc: Misc fixes to tree docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
1bf44f09ba doc: Misc fixes to parser docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
b7274fb02f doc: Misc fixes to HTML parser docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
411f30ef2a doc: Don't document legacy HTML parser macros 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
4a01087585 doc: Move parser option docs to enum 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
a449c5fde3 catalog: Deprecate some functions 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
075283d49d xlink: Deprecate remaining public function
This was never finished.
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
2c150e62f5 doc: Formatting fixes 2025-05-02 20:21:39 +02:00
Nick Wellnhofer
08a282f9f7 doc: Doxygen fixes for xmlversion.h 2025-05-02 20:12:52 +02:00
Nick Wellnhofer
e78e05c990 doc: Fix autolinks to functions
Unfortunately, autolinks in .c files aren't converted by Doxygen for
some reason.
2025-05-02 17:45:31 +02:00
Nick Wellnhofer
f7c412874b doc: Remove more comment block headers 2025-05-02 17:41:26 +02:00
Nick Wellnhofer
0ffa7dd8b1 include: Add hyperlink to deprecation warnings
Doxygen creates a nice "deprecated list" for us.
2025-05-02 14:52:03 +02:00
Nick Wellnhofer
1eca6e3476 parser: Deprecate xmlClearParserCtxt 2025-05-02 13:33:35 +02:00
Nick Wellnhofer
e525564f65 doc: Remove empty lines at start of block
These lines were left over after automatic conversion.
2025-05-02 11:42:05 +02:00
Nick Wellnhofer
fd6ab89be3 doc: Adjust documentation of public structs 2025-05-01 23:23:42 +02:00
Nick Wellnhofer
8816f267be doc: Adjust documentation of enums 2025-05-01 23:23:42 +02:00
Nick Wellnhofer
e549622bc5 doc: Convert documentation to Doxygen
Automated conversion based on a few regexes.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
69879da88f doc: Remove email addresses from documentation
Also remove authorship information from generated files, hash.c and
globals.c which were rewritten.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
61890e399d doc: Prepare for conversion to Doxygen
Fix many params in internal functions (not really necessary but Doxygen
warns about that in XML mode).

Fix formatting in a few corner cases that automatic conversion can't
handle.

Rearrange some DOC_DISABLE blocks.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
87b30343f6 io: Fix linkage of __xml*BufferCreateFilename functions
Make these functions usable on Windows.
2025-04-29 20:36:25 +02:00
Nick Wellnhofer
fc8899d47c parser: Make xmlCtxtGetValidCtxt depend on VALID_ENABLED 2025-04-27 13:01:42 +02:00
Nick Wellnhofer
b85d77d156 http: Remove built-in HTTP client
Stubs are retained for ABI compatibility.

Fixes #631.
Obsoletes #160.
2025-04-20 18:21:06 +02:00
Nick Wellnhofer
4ba1f9238a html: Avoid HTML_PARSE_HTML5 clashing with XML_PARSE_NOENT
There are several users that pass invalid XML parser options to the
HTML parser. Choose a value that is less likely to clash.
2025-04-18 18:48:25 +02:00
Nick Wellnhofer
aa4ef7737b parser: Deprecate output-related globals 2025-04-17 21:14:00 +02:00
Nick Wellnhofer
fc4adba90e error: Fix initGenericErrorDefaultFunc compatibility macro 2025-04-12 16:26:07 +02:00
Nick Wellnhofer
97ffa77d6d encoding: Deprecate non-thread-safe functions 2025-04-10 17:36:58 +02:00
Nick Wellnhofer
2ecc08f6dc html: Deprecate more functions 2025-04-10 16:36:03 +02:00
Nick Wellnhofer
b349225952 include: Change some return types from int to enum
This also affects some new functions from 2.13.
2025-03-14 02:31:01 +01:00
Nick Wellnhofer
fd1b939168 include: Convert some macros to enums 2025-03-14 00:35:40 +01:00
Nick Wellnhofer
84c6524e26 encoding: Support input-only and output-only converters
Make it possible to open an encoding handler only for input or output.
This avoids the creation of unnecessary converters.

Should also fix #863.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
69b83bb68e encoding: Detect truncated multi-byte sequences with ICU
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.

It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.

Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
03a8f1dd75 doc: Document SAX handlers a little more 2025-03-11 18:53:59 +01:00
Nick Wellnhofer
87c9e000e5 encoding: Rework custom encoding implementation API 2025-03-09 22:37:13 +01:00
Nick Wellnhofer
ba9148d8a5 parser: Undeprecate input->consumed
Should be deprecated after fixing #762.
2025-03-09 20:30:49 +01:00