1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00
Commit Graph

155 Commits

Author SHA1 Message Date
Nick Wellnhofer
c5b45fbc07 doc: Misc fixes 2025-05-16 19:04:20 +02:00
Nick Wellnhofer
adfbeb7e08 doc: Stop using *Ptr typedefs in documentation 2025-05-16 18:03:12 +02:00
Nick Wellnhofer
a40f36e7f2 include: Stop using *Ptr typedefs in public headers 2025-05-16 18:03:12 +02:00
Nick Wellnhofer
777e2adf77 io: Consolidate escaping code
Use generated table approach of xmlSerializeText for xmlEscapeText.

Move most code to xmlIO.c.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
cdaf657ffb html: Don't escape < and > when serializing attribute values
Align with HTML5.

This will break some test suites.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
e0e0a1f0f5 html: Remove special handling of &{...} when serializing
See https://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1

Align with HTML5.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
dad1163078 entities: Always replace invalid chars when escaping
The previous refactor painstakingly recreated the different behavior of
separate functions that were merged. It makes

Optimize IS_CHAR check for non-ASCII chars.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
63535d3922 tree: Make xmlNodeListGetStringInternal work with escape flags 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
9bbffec568 doc: Move brief to top, params to bottom of doc comments 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
714decd6d6 doc: Misc fixes to entities docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
cb1635a642 doc: Use @since command 2025-05-02 19:05:25 +02:00
Nick Wellnhofer
e78e05c990 doc: Fix autolinks to functions
Unfortunately, autolinks in .c files aren't converted by Doxygen for
some reason.
2025-05-02 17:45:31 +02:00
Nick Wellnhofer
e525564f65 doc: Remove empty lines at start of block
These lines were left over after automatic conversion.
2025-05-02 11:42:05 +02:00
Nick Wellnhofer
e549622bc5 doc: Convert documentation to Doxygen
Automated conversion based on a few regexes.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
69879da88f doc: Remove email addresses from documentation
Also remove authorship information from generated files, hash.c and
globals.c which were rewritten.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
61890e399d doc: Prepare for conversion to Doxygen
Fix many params in internal functions (not really necessary but Doxygen
warns about that in XML mode).

Fix formatting in a few corner cases that automatic conversion can't
handle.

Rearrange some DOC_DISABLE blocks.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
9c16a153d8 Revert "include: Make most IS_* macros private"
This reverts commit 84a6c82ff8.
2025-02-13 20:20:17 +01:00
Nick Wellnhofer
2e3a91a766 doc: Fix documentation 2024-12-26 21:05:39 +01:00
Nick Wellnhofer
84a6c82ff8 include: Make most IS_* macros private
Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.
2024-12-21 20:01:30 +01:00
Nick Wellnhofer
3f72a579c2 entities: Check reallocations for overflow 2024-12-21 19:37:37 +01:00
Nick Wellnhofer
89b9f45711 entities: Allow control chars when serializing HTML 2024-10-25 18:02:58 +02:00
Nick Wellnhofer
1cfc5b8089 entities: Rework serialization of numeric character references 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
8d1606265d entities: Rework text escaping 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
b0fc67aa22 build: Remove --with-tree configuration option
This option would allow for a smaller, but mostly useless minimal build.
But it complicates the symbol availability logic in an insane way and
requires specialized tools like our custom C parser in doc/apibuild.py.

See #717.
2024-06-16 18:47:12 +02:00
Nick Wellnhofer
8318b5a634 parser: Fix NULL checks for output arguments 2024-06-09 15:08:43 +02:00
Nick Wellnhofer
f0d891585d entities: Unconst predefined entities
Partial revert of commit 63ce5f9a. For some reason, Chromium and WebKit
set the etype member of predefined entities. This should be fixed first.
2024-06-01 15:41:43 +02:00
Nick Wellnhofer
e75e878e02 doc: Update and fix documentation 2024-05-20 14:23:39 +02:00
Nick Wellnhofer
63ce5f9aed Make some globals const 2024-04-28 17:53:39 +02:00
Nick Wellnhofer
ee0c1f87c0 fuzz: New tree API fuzzer 2024-03-15 19:54:27 +01:00
Nick Wellnhofer
edbf1eb63b entities: Don't allow null name in xmlNewEntity 2024-03-15 19:47:08 +01:00
Nick Wellnhofer
50816b8d1a entities: Check for illegal entity types in xmlAddEntity 2024-03-15 19:47:08 +01:00
Nick Wellnhofer
ab345338a4 valid: Report malloc failure in legacy DTD serialization 2024-03-15 19:47:08 +01:00
Nick Wellnhofer
fbe10a466f save: Move DTD serialization code to xmlsave.c 2024-02-04 14:33:19 +01:00
Nick Wellnhofer
c2b3294f60 fuzz: Abort on invalid UTF-8
The parser should never generate invalid UTF-8 these days even in
recovery mode.
2024-01-04 21:20:51 +01:00
Nick Wellnhofer
d025cfbb4b parser: Always copy content from entity to target.
Make sure that references from IDs are updated.

Note that if there are IDs with the same value in a document, the last
one will now be returned. IDs should be unique, but maybe this should be
addressed.
2023-12-29 01:22:11 +01:00
Nick Wellnhofer
a1f7ecaef8 entities: Report malloc failures
Fix places where malloc failures aren't reported.

Introduce new API function xmlAddEntity that returns separate error
codes.

Don't invoke global error handler for low-level errors which should be
handled by higher layers.

Invalid redelcaration warnings will be fixed later.
2023-12-11 22:05:47 +01:00
Nick Wellnhofer
713ded60ad entities: Make xmlFreeEntity public 2023-10-06 10:47:07 +02:00
Nick Wellnhofer
699299cae3 globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
Nick Wellnhofer
9d80a2b134 entities: Don't change doc when encoding entities
doc->encoding shouldn't be touched by xmlEncodeEntitiesInternal.
2023-08-17 12:47:14 +02:00
Nick Wellnhofer
ce76ebfd13 entities: Stop counting entities
This was only used in the old version of xmlParserEntityCheck.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
463bbeeca1 entities: Rework entity amplification checks
This commit implements robust detection of entity amplification attacks,
better known as the "billion laughs" attack.

We now limit the size of the document after substitution of entities to
10 times the size before expansion. This guarantees linear behavior by
definition. There already was a similar check before, but the accounting
of "sizeentities" (size of external entities) and "sizeentcopy" (size of
all copies created by entity references) wasn't accurate.

We also need saturation arithmetic since we're historically limited to
"unsigned long" which is 32-bit on many platforms.

A maximum of 10 MB of substitutions is always allowed. This should make
use cases like DITA work which have caused problems in the past.

The old checks based on the number of entities were removed. This is
accounted for by adding a fixed cost to each entity reference.

Entity amplification checks are now enabled even if XML_PARSE_HUGE is
set. This option is mainly used to allow larger text nodes. Most users
were unaware that it also disabled entity expansion checks.

Some of the limits might be adjusted later. If this change turns out to
affect legitimate use cases, we can add a separate parser option to
disable the checks.

Fixes #294.
Fixes #345.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
f34f184f8e entities: Add "flags" member to struct xmlEntity
This will hold various flags and eventually replace the "checked"
member.
2022-12-19 15:24:53 +01:00
Nick Wellnhofer
2059df5358 buf: Deprecate static/immutable buffers 2022-11-20 21:16:03 +01:00
Nick Wellnhofer
644a89e080 [CVE-2022-40304] Fix dict corruption caused by entity reference cycles
When an entity reference cycle is detected, the entity content is
cleared by setting its first byte to zero. But the entity content might
be allocated from a dict. In this case, the dict entry becomes corrupted
leading to all kinds of logic errors, including memory errors like
double-frees.

Stop storing entity content, orig, ExternalID and SystemID in a dict.
These values are unlikely to occur multiple times in a document, so they
shouldn't have been stored in a dict in the first place.

Thanks to Ned Williamson and Nathan Wachholz working with Google Project
Zero for the report!
2022-10-14 15:02:06 +02:00
Nick Wellnhofer
2cac626976 Don't use sizeof(xmlChar) or sizeof(char) 2022-09-01 03:35:19 +02:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
f550977295 Fix documentation in entities.c 2022-02-20 22:06:16 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
ce0871e15c Only warn on invalid redeclarations of predefined entities
Downgrade the error message to a warning since the error was ignored,
anyway. Also print the name of redeclared entity. For a proper fix that
also shows filename and line number of the invalid redeclaration, we'd
have to

- pass the parser context to the entity functions somehow, or
- make these functions return distinct error codes.

Partial fix for #308.
2022-02-20 21:49:04 +01:00