1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00
Commit Graph

1388 Commits

Author SHA1 Message Date
Nick Wellnhofer
e7802738c6 parser: Don't load external content if only XML_SKIP_IDS is set
At some point, the `loadsubset` member was augmented to also control
handling of ID attributes in addition to loading of external DTDs. These
two features are unrelated and shouldn't have been mixed. This mistake
was probably inspired by the misnamed XML_DETECT_IDS flag. As a side
effect, setting XML_SKIP_IDS always enabled loading of external DTDs and
parameter entities.

This change makes it possible to ignore IDs without loading external
content. This is a deliberate API change that improves security and is
unlikely to affect users.

This also makes sure that the new XML_PARSE_SKIP_IDS option doesn't
enable unsafe behavior.
2025-06-22 15:18:43 +02:00
Nick Wellnhofer
7e3818424a include: s/char const/const char/ 2025-06-22 14:31:26 +02:00
Nick Wellnhofer
19139061fb include: Define XMLPUBVAR directly
Using an intermediate macro confuses newer Doxygen versions for some
reason.
2025-06-22 14:31:26 +02:00
Nick Wellnhofer
a4d25b3d93 doc: Small fixes 2025-06-22 14:31:26 +02:00
Michael Mann
cf4f967266 Add XML_PARSE_SKIP_IDS to replace XML_SKIP_IDS
Mark loadset member as deprecated

Fixes #873
2025-06-22 08:03:34 -04:00
Nick Wellnhofer
2963a0f13a tree: Undeprecate some members used by libxslt 2025-06-20 21:41:24 +02:00
Nick Wellnhofer
7e08d93c94 doc: Improve documentation of tree data types 2025-06-08 14:22:32 +02:00
Nick Wellnhofer
2b6b3945f2 Revert "SAX1: Align handling of default attributes with SAX2"
This reverts commit db65b2fc51.

This didn't check for duplicate default attributes.
2025-06-03 16:21:56 +02:00
Nick Wellnhofer
5e7c72cd5c doc: Misc fixes 2025-06-03 01:27:12 +02:00
Nick Wellnhofer
5f8e537d0a doc: Misc fixes to xpointer 2025-06-03 00:59:00 +02:00
Nick Wellnhofer
0ab5d7c557 entities: Deprecate internal DTD-related functions 2025-06-03 00:13:26 +02:00
Nick Wellnhofer
347c2b2ec7 valid: Deprecate a few functions and xmllint --insert 2025-06-02 23:54:28 +02:00
Nick Wellnhofer
7bd8d1d9cc doc: Prefix autolinks with '#'
Use `#func` instead of `func()` to ignore parameters and make all
autolinks work.
2025-05-28 16:01:52 +02:00
Nick Wellnhofer
6e33d136e1 error: Fix initGenericErrorDefaultFunc compatibility macro again
Now it really should work as before.
2025-05-28 14:57:37 +02:00
Nick Wellnhofer
30cf6d0980 parser: Add XML_INPUT_USE_SYS_CATALOG
Also clean up catalog resolution and add error handling using the
global error.

Don't try to look up the resolved URI a second time.

Add some comments. Fix documentation.
2025-05-26 16:51:59 +02:00
Nick Wellnhofer
4dc44c83ab parser: Rework entity boundary check for element content
Only use depth of input stack. This makes the input ID unused
internally.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
db65b2fc51 SAX1: Align handling of default attributes with SAX2
The SAX1 parser is legacy code, but it seems more maintainable to align
it with SAX2.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
2f3655c9c3 parser: Pop PEs that start markup declarations explicitly
We currently only handle "Validity constraint: Proper Declaration/PE
Nesting", but we must detect "Well-formedness constraint: PE Between
Declarations" separately:

> The replacement text of a parameter entity reference in a DeclSep must
> match the production extSubsetDecl.

PEs in DeclSeps are PEs that start with a full markup declaration (or
another PE). These are handled in xmParse{Internal|External}Subset. We
set a flag on these PEs and don't close them implicitly in
xmlSkipBlankCharsPE. This will make unterminated declarations in such
PEs cause a parser error. The PEs are closed explicitly in
xmParse{Internal|External}Subset, the only location where they are
allowed to end.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
dd1961e0d8 valid: Skip more validity checks if not validating 2025-05-25 14:26:30 +02:00
Nick Wellnhofer
fca0860d6c tree: Deprecate public struct members related to DTDs
Let's deprecate these members for now. If these are really used, they
can be undeprecated later.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
7c9b55356d doc: Document unused error domains 2025-05-19 20:07:54 +02:00
Nick Wellnhofer
7008740a96 parser: Consolidate scanning of XML Names
Use new productions by default.

Fixes #194.
Fixes #364.
See #707.
2025-05-19 19:58:33 +02:00
Nick Wellnhofer
210f5a3746 chvalid: Mark functions as deprecated 2025-05-16 23:27:51 +02:00
Nick Wellnhofer
954aae907d doc: Improve regexp documentation 2025-05-16 21:13:17 +02:00
Nick Wellnhofer
c5b45fbc07 doc: Misc fixes 2025-05-16 19:04:20 +02:00
Nick Wellnhofer
c4926b19d3 codegen: Merge xmlunicode.c into xmlregexp.c
Include generated parts.

Generate xmlChRangeGroups instead of functions for Unicode blocks.
2025-05-16 19:04:20 +02:00
Nick Wellnhofer
4cb767e96e codegen: Only generate tables for character ranges
The rest can be easily maintained manually.
2025-05-16 19:04:20 +02:00
Nick Wellnhofer
6f4b452742 parser: Stop using ctxt->linenumbers
I think this was used to avoid setting the `line` member before it was
added (20+ years ago).
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
a05fa9a905 codegen: Rerun codegen scripts 2025-05-16 18:03:12 +02:00
Nick Wellnhofer
a40f36e7f2 include: Stop using *Ptr typedefs in public headers 2025-05-16 18:03:12 +02:00
Nick Wellnhofer
2d83a84ca6 doc: Misc improvements 2025-05-16 18:03:12 +02:00
Nick Wellnhofer
f0983199e8 html: Map some encodings according to HTML5
Windows-1252 is a superset of ISO-8859-1 and should be used instead.
Same for ASCII.

Also map UCS-2 and UTF-16 to UTF-16LE.
2025-05-12 14:04:30 +02:00
Nick Wellnhofer
628006f457 encoding: Add windows-1252
Fixes #915.
2025-05-12 13:27:22 +02:00
Nick Wellnhofer
f602c0c186 html: Rework serialization of meta encoding attributes
Don't allocate memory.
2025-05-12 00:05:02 +02:00
Nick Wellnhofer
0674ccb7cb html: Stop omitting end tags when serializing
Align with HTML5.
2025-05-11 20:57:07 +02:00
Nick Wellnhofer
05b8fe0a06 html: Don't escape RAWTEXT and PLAINTEXT
Align with HTML5.
2025-05-11 20:57:07 +02:00
Nick Wellnhofer
777e2adf77 io: Consolidate escaping code
Use generated table approach of xmlSerializeText for xmlEscapeText.

Move most code to xmlIO.c.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
dad1163078 entities: Always replace invalid chars when escaping
The previous refactor painstakingly recreated the different behavior of
separate functions that were merged. It makes

Optimize IS_CHAR check for non-ASCII chars.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
971038e59f html: Call lower-level escaping functions
Removes the need to pass a document around.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
63535d3922 tree: Make xmlNodeListGetStringInternal work with escape flags 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
442c1903af doc: Fix some damage from automated conversions
Add some newlines, fix returns.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
98a61c9dff doc: Fix briefs in tree docs 2025-05-11 20:29:25 +02:00
Nick Wellnhofer
46f05ea4d5 html: Rework meta charset handling
Don't use encoding from meta tags when serializing. Only use the value
in `doc->encoding`, matching the XML serializer. This is the actual
encoding used when parsing.

Stop modifying the input document by setting meta tags before
serializing. Meta tags are now injected during serialization.

Add full support for <meta charset=""> which is also used when adding
meta tags.

Align with HTML5 and implement the "algorithm for extracting a character
encoding from a meta element". Only modify the encoding substring in
Content-Type meta tags.

Only switch encoding once when parsing.

Fix htmlSaveFileFormat with a NULL encoding not to declare a misleading
UTF-8 charset.

Fixes #909.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
38ea8fa9de doc: Fix varargs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
9bbffec568 doc: Move brief to top, params to bottom of doc comments 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
ab13fbfd68 doc: Misc fixes to error docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
b1685459a3 doc: Misc fixes to xmlsave docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
298f70b3d7 doc: Misc fixes to HTML tree docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
80b6429fb3 doc: Misc fixes to encoding docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
81ac2e27fd doc: Misc fixes to valid docs 2025-05-06 19:51:38 +02:00