1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-21 14:53:44 +03:00
Commit Graph

177 Commits

Author SHA1 Message Date
Nick Wellnhofer
6a6a46f017 doc: Fix autolink errors
Fix links, remove links to internal functions.
2025-05-28 16:02:41 +02:00
Nick Wellnhofer
7bd8d1d9cc doc: Prefix autolinks with '#'
Use `#func` instead of `func()` to ignore parameters and make all
autolinks work.
2025-05-28 16:01:52 +02:00
Nick Wellnhofer
adfbeb7e08 doc: Stop using *Ptr typedefs in documentation 2025-05-16 18:03:12 +02:00
Nick Wellnhofer
a40f36e7f2 include: Stop using *Ptr typedefs in public headers 2025-05-16 18:03:12 +02:00
Nick Wellnhofer
af4fae5ae3 html: Add some comments regarding HTML5 serialization
It seems that the specification of the HTML output method in XSLT 1.0
had a lot of influence on how the HTML serializer in libxml2 ended up:

https://www.w3.org/TR/xslt-10/#section-HTML-Output-Method

There are two remaining behaviors suggested by XSLT 1.0 that don't match
the HTML5 fragment serialization algorithm:

We escape non-ASCII characters in URI attributes (the list of which is
probably outdated). This was originally recommended in appendix B of the
HTML 4.01 spec, but only for user agents:

https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.1

From my experience, any tool that processes HTML should escape as little
as possible. For example, we used to escape many more characters which
are invalid in URIs, but often used in template languages. (Note that we
still escape whitespace and control chars.) Nevertheless, I guess that
some libxslt users continue to expect this behavior from libxml2.

Then we collapse Boolean attributes using an outdated list. This is
mostly a cosmetic issue, but a somewhat important one for libxslt users.

We probably need a serialization option for the xmlsave module that
enables fully HTML5-conformant output.
2025-05-13 23:00:51 +02:00
Nick Wellnhofer
fcb7a777ce io: Make xmlOutputBufferCreate* not free encoder on error
Revert a530ff12 which was an inadvertent API change.
2025-05-13 22:44:42 +02:00
Nick Wellnhofer
5b71dca613 Fix -Wunterminated-string-initialization warnings
Don't use strings for table.
2025-05-12 21:58:06 +02:00
Nick Wellnhofer
c2929b5dd3 html: Ignore namespaces when handling meta tags
Revert to old behavior to fix issues with XHTML documents.
2025-05-12 21:01:35 +02:00
Nick Wellnhofer
f602c0c186 html: Rework serialization of meta encoding attributes
Don't allocate memory.
2025-05-12 00:05:02 +02:00
Nick Wellnhofer
7654c2efc0 html: Rework serialization of URIs
Don't allocate memory.
2025-05-12 00:04:00 +02:00
Nick Wellnhofer
bd777e4f42 html: Speed up htmlIsBooleanAttr
This is used when serializing.
2025-05-11 23:28:40 +02:00
Nick Wellnhofer
825f3a9d0c html: Always serialize attributes with double quotes
Align with HTML5.
2025-05-11 21:42:51 +02:00
Nick Wellnhofer
5c4cc456a4 html: Escape encoding in meta tags 2025-05-11 21:30:30 +02:00
Nick Wellnhofer
0674ccb7cb html: Stop omitting end tags when serializing
Align with HTML5.
2025-05-11 20:57:07 +02:00
Nick Wellnhofer
05b8fe0a06 html: Don't escape RAWTEXT and PLAINTEXT
Align with HTML5.
2025-05-11 20:57:07 +02:00
Nick Wellnhofer
0d81d6f811 html: Use xmlOutputBufferWrite if possible 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
89fcfe3a29 html: Start to use xmlSerializeText
Avoid temporary copy to speed up serialization.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
777e2adf77 io: Consolidate escaping code
Use generated table approach of xmlSerializeText for xmlEscapeText.

Move most code to xmlIO.c.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
e0e0a1f0f5 html: Remove special handling of &{...} when serializing
See https://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1

Align with HTML5.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
971038e59f html: Call lower-level escaping functions
Removes the need to pass a document around.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
442c1903af doc: Fix some damage from automated conversions
Add some newlines, fix returns.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
46f05ea4d5 html: Rework meta charset handling
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.

Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.

Add full support for <meta charset=""> which is also used when adding
meta tags.

Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.

Only switch encoding once when parsing.

Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.

Fixes #909.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
655ac5f851 html: Add comment regarding hack for XML documents 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
9bbffec568 doc: Move brief to top, params to bottom of doc comments 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
298f70b3d7 doc: Misc fixes to HTML tree docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
e549622bc5 doc: Convert documentation to Doxygen
Automated conversion based on a few regexes.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
69879da88f doc: Remove email addresses from documentation
Also remove authorship information from generated files, hash.c and
globals.c which were rewritten.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
61890e399d doc: Prepare for conversion to Doxygen
Fix many params in internal functions (not really necessary but Doxygen
warns about that in XML mode).

Fix formatting in a few corner cases that automatic conversion can't
handle.

Rearrange some DOC_DISABLE blocks.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
b349225952 include: Change some return types from int to enum
This also affects some new functions from 2.13.
2025-03-14 02:31:01 +01:00
Nick Wellnhofer
f68c70d298 html: Remove htmlSaveErr
This function is useless now.
2025-02-19 12:27:26 +01:00
Nick Wellnhofer
0315ac9390 html: Handle error from htmlFindOutputEncoder 2025-02-19 12:27:26 +01:00
Nick Wellnhofer
9c16a153d8 Revert "include: Make most IS_* macros private"
This reverts commit 84a6c82ff8.
2025-02-13 20:20:17 +01:00
Nick Wellnhofer
84a6c82ff8 include: Make most IS_* macros private
Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.
2024-12-21 20:01:30 +01:00
Nick Wellnhofer
c34d0ae9cc html: Deprecate htmlIsBooleanAttr 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
a530ff125d io: Always consume encoding handler when creating output buffers
Also free encoding handler in error case.

Remove xmlAllocOutputBufferInternal which was identical to
xmlAllocOutputBuffer.
2024-07-29 14:25:39 +02:00
Nick Wellnhofer
a221cd7849 buf: Rework xmlBuf code
Always use what the old implementation called the "IO" allocation
scheme, allowing to move the content pointer past the initial
allocation. This is inexpensive and allows efficient shrinking.

Optimize xmlBufGrow, reusing shrunken memory as much as possible.

Simplify xmlBufAdd.

Make xmlBufBackToBuffer return an error on overflow.

Make "size" exclude the terminating NULL byte.

Always provide an initial size.

Reintroduce static buffers.

Remove xmlBufResize and several other functions.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
598ee0d2c6 error: Remove underscores from xmlRaiseError 2024-06-27 14:43:10 +02:00
Nick Wellnhofer
5b893fa999 encoding: Fix encoding lookup with xmlOpenCharEncodingHandler
Make xmlOpenCharEncodingHandler call xmlParseCharEncoding first so we
prefer our own handlers for names like "UTF8". Only UTF-16 needs an
exception.

Make callers check the return value. For UTF-8, a NULL encoding doesn't
mean an error.

Remove unnecessary UTF-8 check from htmlFindOutputEncoder. Don't try to
look up ASCII handler since the HTML handler is always available.

Fix return code of xmlParseCharEncoding.

Should fix #744.
2024-06-22 21:59:03 +02:00
Nick Wellnhofer
72e9267c32 html: Fix memory leak after malloc failure 2024-05-06 17:40:15 +02:00
Nick Wellnhofer
10c4ed1f2d html: Fix quadratic behavior in htmlNodeDump
Use an efficient buffer allocation scheme.
2024-03-15 19:47:08 +01:00
Nick Wellnhofer
3494aa4fd5 save: Cast return code of xmlBufNodeDump
Avoid implicit sign change.
2024-03-15 19:47:08 +01:00
Nick Wellnhofer
1d392fabb9 save: Check for output buffer errors
Report more error conditions.
2024-03-15 19:47:08 +01:00
Nick Wellnhofer
e314109ad1 save: Don't write directly to internal buffer
Make sure that OOM errors are reported.
2024-02-16 16:14:05 +01:00
Nick Wellnhofer
0821efc8ee encoding: Check whether encoding handlers support input/output
The "HTML" encoding handler doesn't support input which could lead to a
wrong error report.
2024-01-02 19:48:23 +01:00
Nick Wellnhofer
bc1e030664 save: Improve error handling
Handle malloc failrue from xmlRaiseError.

Use xmlRaiseMemoryError.

Stop using xmlGenericError.

Remove argument from memory error handler.

Remove TODO macro.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
abd74186f9 html: Report malloc failures
Fix many places where malloc failures aren't reported.

Stop checking for ctxt->instate.
2023-12-11 22:13:06 +01:00
Nick Wellnhofer
9b5cce7a71 include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
Nick Wellnhofer
699299cae3 globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
Nick Wellnhofer
76d6b0d768 html: Don't escape ASCII chars in href attributes
In several cases, href attributes can contain ASCII characters which are
illegal in URIs. Escaping them often does more harm than good.

Fixes #321.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
ad338ca737 Remove explicit integer casts
Remove explicit integer casts as final operation

- in assignments
- when passing arguments
- when returning values

Remove casts

- to the same type
- from certain range-bound values

The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.

Document some explicit casts as truncating and add a few missing ones.
2022-09-01 02:33:57 +02:00