Nick Wellnhofer
7bd8d1d9cc
doc: Prefix autolinks with '#'
...
Use `#func` instead of `func()` to ignore parameters and make all
autolinks work.
2025-05-28 16:01:52 +02:00
Nick Wellnhofer
258d870629
codegen: Consolidate tools for code generation
...
Move tools, source files and output tables into codegen directory.
Rename some files.
Adjust tools to match modified files. Remove generation date and source
files from output.
Distribute all tools and sources.
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
adfbeb7e08
doc: Stop using *Ptr typedefs in documentation
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
a40f36e7f2
include: Stop using *Ptr typedefs in public headers
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
2d83a84ca6
doc: Misc improvements
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
b0234633e7
encoding: Preserve original encoding label
...
When using built-in encodings, the label would be normalized which
causes various issues. We now create a copy of the handler with the
original name.
This is somewhat dangerous as it will require users to free built-in
encodings with xmlCharEncCloseFunc. But to handle the general case, this
was already required.
Fixes #916 in another way than originally proposed.
2025-05-13 22:53:02 +02:00
Nick Wellnhofer
19b9931184
encoding: Fix -Wswitch warning
2025-05-12 21:07:41 +02:00
Nick Wellnhofer
f0983199e8
html: Map some encodings according to HTML5
...
Windows-1252 is a superset of ISO-8859-1 and should be used instead.
Same for ASCII.
Also map UCS-2 and UTF-16 to UTF-16LE.
2025-05-12 14:04:30 +02:00
Nick Wellnhofer
93f671064e
encoding: Add HTML5 aliases
2025-05-12 13:27:29 +02:00
Nick Wellnhofer
628006f457
encoding: Add windows-1252
...
Fixes #915 .
2025-05-12 13:27:22 +02:00
Nick Wellnhofer
777e2adf77
io: Consolidate escaping code
...
Use generated table approach of xmlSerializeText for xmlEscapeText.
Move most code to xmlIO.c.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
9bbffec568
doc: Move brief to top, params to bottom of doc comments
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
80b6429fb3
doc: Misc fixes to encoding docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
cb1635a642
doc: Use @since command
2025-05-02 19:05:25 +02:00
Nick Wellnhofer
e78e05c990
doc: Fix autolinks to functions
...
Unfortunately, autolinks in .c files aren't converted by Doxygen for
some reason.
2025-05-02 17:45:31 +02:00
Nick Wellnhofer
e525564f65
doc: Remove empty lines at start of block
...
These lines were left over after automatic conversion.
2025-05-02 11:42:05 +02:00
Nick Wellnhofer
e549622bc5
doc: Convert documentation to Doxygen
...
Automated conversion based on a few regexes.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
69879da88f
doc: Remove email addresses from documentation
...
Also remove authorship information from generated files, hash.c and
globals.c which were rewritten.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
97ffa77d6d
encoding: Deprecate non-thread-safe functions
2025-04-10 17:36:58 +02:00
Nick Wellnhofer
b349225952
include: Change some return types from int to enum
...
This also affects some new functions from 2.13.
2025-03-14 02:31:01 +01:00
Nick Wellnhofer
84c6524e26
encoding: Support input-only and output-only converters
...
Make it possible to open an encoding handler only for input or output.
This avoids the creation of unnecessary converters.
Should also fix #863 .
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
69b83bb68e
encoding: Detect truncated multi-byte sequences with ICU
...
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.
It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.
Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
ef44c240f5
encoding: Fix memory leak in xmlCharEncNewCustomHandler
...
Short-lived regression.
2025-03-10 14:16:14 +01:00
Nick Wellnhofer
87c9e000e5
encoding: Rework custom encoding implementation API
2025-03-09 22:37:13 +01:00
Nick Wellnhofer
38f475072a
encoding: Make conversion callbacks more type-safe
2025-03-05 22:25:14 +01:00
Nick Wellnhofer
a846d96468
encoding: Remove compatibility struct members
2025-03-05 16:49:42 +01:00
Nick Wellnhofer
0b27097a92
encoding: Rename unprefixed public functions
2025-03-04 16:46:53 +01:00
Nick Wellnhofer
3793eaadb7
fuzz: Fix build
2025-02-16 13:55:18 +01:00
Nick Wellnhofer
9c16a153d8
Revert "include: Make most IS_* macros private"
...
This reverts commit 84a6c82ff8
.
2025-02-13 20:20:17 +01:00
Nick Wellnhofer
cfc854b839
fuzz: Work around glibc iconv() bug
2025-02-11 00:21:12 +01:00
Nick Wellnhofer
c4f760be8a
encoding: Handle iconv() returning EOPNOTSUPP on Apple
...
iconv() really shouldn't return undocumented error codes.
2025-02-02 11:15:45 +01:00
Nick Wellnhofer
cdfb54ff7b
Fix typos
2025-01-31 18:41:41 +01:00
Nick Wellnhofer
6ec616ba26
encoding: Don't allow POSIX indicator suffixes in encoding names
...
Suffixes like "//IGNORE" change the behavior of iconv.
Also add comment on how we currently rely on GNU libiconv behavior
which technically violates the POSIX spec.
2025-01-24 20:47:52 +01:00
Nick Wellnhofer
fbaacfe223
encoding: Clean up UCS-4 encodings
...
Use "UCS-*" instead of "ISO-10646-UCS-*". While the XML spec recommends
"ISO-10646-UCS-2" and "ISO-10646-UCS-4", GNU iconv doesn't understand
these names.
Ignore UCS4_2143 and UCS4_3412 which were never supported.
2025-01-16 16:09:14 +01:00
Nick Wellnhofer
df0f16fa26
encoding: Check reallocations for overflow
2024-12-21 19:37:37 +01:00
Nick Wellnhofer
dae160c64b
encoding: Fix table entry for "UTF16"
2024-09-13 12:08:20 +02:00
Nick Wellnhofer
6e503eb742
encoding: Handle more ICU error codes
...
U_ILLEGAL_ESCAPE_SEQUENCE and U_UNSUPPORTED_ESCAPE_SEQUENCE can occur
with ISO-2022.
2024-09-10 03:34:46 +02:00
Nick Wellnhofer
55d36c5990
encoding: Fix error code in xmlUconvConvert
...
Broke in 46ec621e
.
2024-09-10 03:11:18 +02:00
Nick Wellnhofer
34c9108f15
encoding: Add sizeOut argument to xmlCharEncInput
...
When push parsing, we want to convert as much of the input as possible.
When pull parsing memory buffers, we want to convert data chunk by chunk
to save memory.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
1cfc5b8089
entities: Rework serialization of numeric character references
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
69f12d6d47
encoding: Deprecate xmlByteConsumed
...
This was only used by Chromium/WebKit to detect whether xmlParseContent
really succeeded. It's a horrible, overcomplicated hack.
See 8c5848bd
and #767 .
2024-07-13 15:42:02 +02:00
Nick Wellnhofer
d099795611
encoding: Readd some UTF-8 validation to encoders
...
This isn't strictly needed but avoids generating invalid UTF-16 and
unsigned integer overflows.
2024-07-10 22:26:19 +02:00
Nick Wellnhofer
f48eefe3d0
encoding: Rework xmlByteConsumed
...
Don't loop infinitely if input buffer is too large. Allocate conversion
buffer on the heap.
2024-07-09 14:25:32 +02:00
Nick Wellnhofer
f86d17c163
encoding: Fix xmlParseCharEncoding
...
Make "UTF-16" return the UTF16LE handler as before.
Fix error return.
2024-07-04 15:47:20 +02:00
Nick Wellnhofer
46ec621eb7
encoding: Clarify xmlUconvConvert
2024-07-03 16:06:59 +02:00
Nick Wellnhofer
48fec2429b
encoding: Remove duplicate code
...
Fix recent commit.
2024-07-03 15:11:20 +02:00
Nick Wellnhofer
71fb257912
encoding: Fix ICU build
2024-07-03 14:35:49 +02:00
Nick Wellnhofer
9a4770ef84
doc: Improve documentation
2024-07-02 13:34:04 +02:00
Nick Wellnhofer
0b0dd98983
parser: Fix EBCDIC detection
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
37a9ff11d8
encoding: Simplify xmlCharEncCloseFunc
2024-07-01 18:05:40 +02:00