libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-21 14:53:44 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	6a6a46f017	doc: Fix autolink errors Fix links, remove links to internal functions.	2025-05-28 16:02:41 +02:00
Nick Wellnhofer	7bd8d1d9cc	doc: Prefix autolinks with '#' Use `#func` instead of `func()` to ignore parameters and make all autolinks work.	2025-05-28 16:01:52 +02:00
Nick Wellnhofer	adfbeb7e08	doc: Stop using *Ptr typedefs in documentation	2025-05-16 18:03:12 +02:00
Nick Wellnhofer	a40f36e7f2	include: Stop using *Ptr typedefs in public headers	2025-05-16 18:03:12 +02:00
Nick Wellnhofer	af4fae5ae3	html: Add some comments regarding HTML5 serialization It seems that the specification of the HTML output method in XSLT 1.0 had a lot of influence on how the HTML serializer in libxml2 ended up: https://www.w3.org/TR/xslt-10/#section-HTML-Output-Method There are two remaining behaviors suggested by XSLT 1.0 that don't match the HTML5 fragment serialization algorithm: We escape non-ASCII characters in URI attributes (the list of which is probably outdated). This was originally recommended in appendix B of the HTML 4.01 spec, but only for user agents: https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.1 From my experience, any tool that processes HTML should escape as little as possible. For example, we used to escape many more characters which are invalid in URIs, but often used in template languages. (Note that we still escape whitespace and control chars.) Nevertheless, I guess that some libxslt users continue to expect this behavior from libxml2. Then we collapse Boolean attributes using an outdated list. This is mostly a cosmetic issue, but a somewhat important one for libxslt users. We probably need a serialization option for the xmlsave module that enables fully HTML5-conformant output.	2025-05-13 23:00:51 +02:00
Nick Wellnhofer	fcb7a777ce	io: Make xmlOutputBufferCreate* not free encoder on error Revert `a530ff12` which was an inadvertent API change.	2025-05-13 22:44:42 +02:00
Nick Wellnhofer	5b71dca613	Fix -Wunterminated-string-initialization warnings Don't use strings for table.	2025-05-12 21:58:06 +02:00
Nick Wellnhofer	c2929b5dd3	html: Ignore namespaces when handling meta tags Revert to old behavior to fix issues with XHTML documents.	2025-05-12 21:01:35 +02:00
Nick Wellnhofer	f602c0c186	html: Rework serialization of meta encoding attributes Don't allocate memory.	2025-05-12 00:05:02 +02:00
Nick Wellnhofer	7654c2efc0	html: Rework serialization of URIs Don't allocate memory.	2025-05-12 00:04:00 +02:00
Nick Wellnhofer	bd777e4f42	html: Speed up htmlIsBooleanAttr This is used when serializing.	2025-05-11 23:28:40 +02:00
Nick Wellnhofer	825f3a9d0c	html: Always serialize attributes with double quotes Align with HTML5.	2025-05-11 21:42:51 +02:00
Nick Wellnhofer	5c4cc456a4	html: Escape encoding in meta tags	2025-05-11 21:30:30 +02:00
Nick Wellnhofer	0674ccb7cb	html: Stop omitting end tags when serializing Align with HTML5.	2025-05-11 20:57:07 +02:00
Nick Wellnhofer	05b8fe0a06	html: Don't escape RAWTEXT and PLAINTEXT Align with HTML5.	2025-05-11 20:57:07 +02:00
Nick Wellnhofer	0d81d6f811	html: Use xmlOutputBufferWrite if possible	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	89fcfe3a29	html: Start to use xmlSerializeText Avoid temporary copy to speed up serialization.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	777e2adf77	io: Consolidate escaping code Use generated table approach of xmlSerializeText for xmlEscapeText. Move most code to xmlIO.c.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	e0e0a1f0f5	html: Remove special handling of &{...} when serializing See https://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1 Align with HTML5.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	971038e59f	html: Call lower-level escaping functions Removes the need to pass a document around.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	442c1903af	doc: Fix some damage from automated conversions Add some newlines, fix returns.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	46f05ea4d5	html: Rework meta charset handling Don't use encoding from meta tags when serializing. Only use the value in `doc->encoding`, matching the XML serializer. This is the actual encoding used when parsing. Stop modifying the input document by setting meta tags before serializing. Meta tags are now injected during serialization. Add full support for <meta charset=""> which is also used when adding meta tags. Align with HTML5 and implement the "algorithm for extracting a character encoding from a meta element". Only modify the encoding substring in Content-Type meta tags. Only switch encoding once when parsing. Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading UTF-8 charset. Fixes #909.	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	655ac5f851	html: Add comment regarding hack for XML documents	2025-05-11 20:29:25 +02:00
Nick Wellnhofer	9bbffec568	doc: Move brief to top, params to bottom of doc comments	2025-05-06 19:51:38 +02:00
Nick Wellnhofer	298f70b3d7	doc: Misc fixes to HTML tree docs	2025-05-06 19:51:38 +02:00
Nick Wellnhofer	e549622bc5	doc: Convert documentation to Doxygen Automated conversion based on a few regexes.	2025-05-01 23:23:42 +02:00
Nick Wellnhofer	69879da88f	doc: Remove email addresses from documentation Also remove authorship information from generated files, hash.c and globals.c which were rewritten.	2025-05-01 23:23:42 +02:00
Nick Wellnhofer	61890e399d	doc: Prepare for conversion to Doxygen Fix many params in internal functions (not really necessary but Doxygen warns about that in XML mode). Fix formatting in a few corner cases that automatic conversion can't handle. Rearrange some DOC_DISABLE blocks.	2025-05-01 23:23:42 +02:00
Nick Wellnhofer	b349225952	include: Change some return types from int to enum This also affects some new functions from 2.13.	2025-03-14 02:31:01 +01:00
Nick Wellnhofer	f68c70d298	html: Remove htmlSaveErr This function is useless now.	2025-02-19 12:27:26 +01:00
Nick Wellnhofer	0315ac9390	html: Handle error from htmlFindOutputEncoder	2025-02-19 12:27:26 +01:00
Nick Wellnhofer	9c16a153d8	Revert "include: Make most IS_* macros private" This reverts commit `84a6c82ff8`.	2025-02-13 20:20:17 +01:00
Nick Wellnhofer	84a6c82ff8	include: Make most IS_* macros private Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.	2024-12-21 20:01:30 +01:00
Nick Wellnhofer	c34d0ae9cc	html: Deprecate htmlIsBooleanAttr	2024-10-06 20:04:00 +02:00
Nick Wellnhofer	a530ff125d	io: Always consume encoding handler when creating output buffers Also free encoding handler in error case. Remove xmlAllocOutputBufferInternal which was identical to xmlAllocOutputBuffer.	2024-07-29 14:25:39 +02:00
Nick Wellnhofer	a221cd7849	buf: Rework xmlBuf code Always use what the old implementation called the "IO" allocation scheme, allowing to move the content pointer past the initial allocation. This is inexpensive and allows efficient shrinking. Optimize xmlBufGrow, reusing shrunken memory as much as possible. Simplify xmlBufAdd. Make xmlBufBackToBuffer return an error on overflow. Make "size" exclude the terminating NULL byte. Always provide an initial size. Reintroduce static buffers. Remove xmlBufResize and several other functions.	2024-07-16 17:42:10 +02:00
Nick Wellnhofer	598ee0d2c6	error: Remove underscores from xmlRaiseError	2024-06-27 14:43:10 +02:00
Nick Wellnhofer	5b893fa999	encoding: Fix encoding lookup with xmlOpenCharEncodingHandler Make xmlOpenCharEncodingHandler call xmlParseCharEncoding first so we prefer our own handlers for names like "UTF8". Only UTF-16 needs an exception. Make callers check the return value. For UTF-8, a NULL encoding doesn't mean an error. Remove unnecessary UTF-8 check from htmlFindOutputEncoder. Don't try to look up ASCII handler since the HTML handler is always available. Fix return code of xmlParseCharEncoding. Should fix #744.	2024-06-22 21:59:03 +02:00
Nick Wellnhofer	72e9267c32	html: Fix memory leak after malloc failure	2024-05-06 17:40:15 +02:00
Nick Wellnhofer	10c4ed1f2d	html: Fix quadratic behavior in htmlNodeDump Use an efficient buffer allocation scheme.	2024-03-15 19:47:08 +01:00
Nick Wellnhofer	3494aa4fd5	save: Cast return code of xmlBufNodeDump Avoid implicit sign change.	2024-03-15 19:47:08 +01:00
Nick Wellnhofer	1d392fabb9	save: Check for output buffer errors Report more error conditions.	2024-03-15 19:47:08 +01:00
Nick Wellnhofer	e314109ad1	save: Don't write directly to internal buffer Make sure that OOM errors are reported.	2024-02-16 16:14:05 +01:00
Nick Wellnhofer	0821efc8ee	encoding: Check whether encoding handlers support input/output The "HTML" encoding handler doesn't support input which could lead to a wrong error report.	2024-01-02 19:48:23 +01:00
Nick Wellnhofer	bc1e030664	save: Improve error handling Handle malloc failrue from xmlRaiseError. Use xmlRaiseMemoryError. Stop using xmlGenericError. Remove argument from memory error handler. Remove TODO macro.	2023-12-21 15:02:24 +01:00
Nick Wellnhofer	abd74186f9	html: Report malloc failures Fix many places where malloc failures aren't reported. Stop checking for ctxt->instate.	2023-12-11 22:13:06 +01:00
Nick Wellnhofer	9b5cce7a71	include: Remove more unnecessary includes	2023-09-21 01:50:53 +02:00
Nick Wellnhofer	699299cae3	globals: Stop including globals.h	2023-09-20 22:07:40 +02:00
Nick Wellnhofer	76d6b0d768	html: Don't escape ASCII chars in href attributes In several cases, href attributes can contain ASCII characters which are illegal in URIs. Escaping them often does more harm than good. Fixes #321.	2022-11-20 21:16:03 +01:00
Nick Wellnhofer	ad338ca737	Remove explicit integer casts Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.	2022-09-01 02:33:57 +02:00

1 2 3 4

177 Commits