1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-21 14:53:44 +03:00
Commit Graph

387 Commits

Author SHA1 Message Date
Nick Wellnhofer
8689523ad9 parser: Implement xmlCtxtGetInputWindow
See #762.
2025-07-31 15:20:20 +02:00
Nick Wellnhofer
469c847f4d parser: Split out xmlParserInputGetWindow 2025-07-31 14:23:23 +02:00
Nick Wellnhofer
8aaa53d712 parser: Implement xmlCtxtGetInputPosition
See #762.
2025-07-31 14:23:23 +02:00
Nick Wellnhofer
144ed959a5 parser: Move xmlSaturatedAdd to private header 2025-07-31 14:23:23 +02:00
Nick Wellnhofer
a7fc9e1add parser: Add more parser context accessors
The only thing remaining is access to parser input, see #762.
2025-07-31 14:23:23 +02:00
Nick Wellnhofer
7a41b18c62 parser: Remove xmlHaltParser
Always halt the parser on resource limit and entity loop errors and
remove the remaining calls which seem unnecessary.
2025-07-31 14:23:23 +02:00
Nick Wellnhofer
cdf4c6f1a2 doc: Mention XML_PARSE_NOERROR in more places 2025-07-31 14:23:23 +02:00
Nick Wellnhofer
7c91385040 parser: Remove unnecessary dict checks when freeing strings
The following strings are never allocated from a dict:

- xmlParserCtxt.version
- xmlParserCtxt.encoding
- xmlParserCtxt.extSubURI
- xmlParserCtxt.extSubSystem
- xmlDoc.version
- xmlDoc.encoding
- xmlDoc.URL
- xmlDTD.ExternalID
- xmlDTD.SystemID
- xmlID.value

Also make the struct members point to non-const chars to avoid casts
when freeing.
2025-06-23 22:59:31 +02:00
Nick Wellnhofer
a4d25b3d93 doc: Small fixes 2025-06-22 14:31:26 +02:00
Nick Wellnhofer
1dcd3df20b parser: Fix xmlCtxtIsStopped
Make xmlCtxtIsStopped check for fatal errors as well. This makes it
easier to migrate away from disableSAX.
2025-06-20 23:46:46 +02:00
Nick Wellnhofer
70335c41fc html: Don't stop on unsupported encoding
Continue to parse unlike in the XML case.
2025-06-08 14:22:32 +02:00
Nick Wellnhofer
7bd8d1d9cc doc: Prefix autolinks with '#'
Use `#func` instead of `func()` to ignore parameters and make all
autolinks work.
2025-05-28 16:01:52 +02:00
Nick Wellnhofer
30cf6d0980 parser: Add XML_INPUT_USE_SYS_CATALOG
Also clean up catalog resolution and add error handling using the
global error.

Don't try to look up the resolved URI a second time.

Add some comments. Fix documentation.
2025-05-26 16:51:59 +02:00
Nick Wellnhofer
34bafa14fe parser: Use parser context as default in resource loader
This allows to access the original context for example when using
modules like XInclude or schemas.
2025-05-25 22:47:34 +02:00
Nick Wellnhofer
6f4b452742 parser: Stop using ctxt->linenumbers
I think this was used to avoid setting the `line` member before it was
added (20+ years ago).
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
adfbeb7e08 doc: Stop using *Ptr typedefs in documentation 2025-05-16 18:03:12 +02:00
Nick Wellnhofer
a40f36e7f2 include: Stop using *Ptr typedefs in public headers 2025-05-16 18:03:12 +02:00
Nick Wellnhofer
2d83a84ca6 doc: Misc improvements 2025-05-16 18:03:12 +02:00
Nick Wellnhofer
cdce17c3cb html: Only map HTML encodings from meta tag 2025-05-12 21:21:25 +02:00
Nick Wellnhofer
f0983199e8 html: Map some encodings according to HTML5
Windows-1252 is a superset of ISO-8859-1 and should be used instead.
Same for ASCII.

Also map UCS-2 and UTF-16 to UTF-16LE.
2025-05-12 14:04:30 +02:00
Nick Wellnhofer
442c1903af doc: Fix some damage from automated conversions
Add some newlines, fix returns.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
38ea8fa9de doc: Fix varargs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
9bbffec568 doc: Move brief to top, params to bottom of doc comments 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
ab13fbfd68 doc: Misc fixes to error docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
1bf44f09ba doc: Misc fixes to parser docs 2025-05-06 19:51:38 +02:00
Nick Wellnhofer
cb1635a642 doc: Use @since command 2025-05-02 19:05:25 +02:00
Nick Wellnhofer
e78e05c990 doc: Fix autolinks to functions
Unfortunately, autolinks in .c files aren't converted by Doxygen for
some reason.
2025-05-02 17:45:31 +02:00
Nick Wellnhofer
1eca6e3476 parser: Deprecate xmlClearParserCtxt 2025-05-02 13:33:35 +02:00
Nick Wellnhofer
e525564f65 doc: Remove empty lines at start of block
These lines were left over after automatic conversion.
2025-05-02 11:42:05 +02:00
Nick Wellnhofer
e549622bc5 doc: Convert documentation to Doxygen
Automated conversion based on a few regexes.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
69879da88f doc: Remove email addresses from documentation
Also remove authorship information from generated files, hash.c and
globals.c which were rewritten.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
61890e399d doc: Prepare for conversion to Doxygen
Fix many params in internal functions (not really necessary but Doxygen
warns about that in XML mode).

Fix formatting in a few corner cases that automatic conversion can't
handle.

Rearrange some DOC_DISABLE blocks.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
fc8899d47c parser: Make xmlCtxtGetValidCtxt depend on VALID_ENABLED 2025-04-27 13:01:42 +02:00
Nick Wellnhofer
b85d77d156 http: Remove built-in HTTP client
Stubs are retained for ABI compatibility.

Fixes #631.
Obsoletes #160.
2025-04-20 18:21:06 +02:00
Nick Wellnhofer
9e3159d04c parser: Never use XML catalogs when parsing HTML files
When loading HTML files we shouldn't try to resolve URIs using the XML
catalogs.
2025-04-19 14:52:14 +02:00
Nick Wellnhofer
b349225952 include: Change some return types from int to enum
This also affects some new functions from 2.13.
2025-03-14 02:31:01 +01:00
Nick Wellnhofer
fd1b939168 include: Convert some macros to enums 2025-03-14 00:35:40 +01:00
Nick Wellnhofer
84c6524e26 encoding: Support input-only and output-only converters
Make it possible to open an encoding handler only for input or output.
This avoids the creation of unnecessary converters.

Should also fix #863.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
69b83bb68e encoding: Detect truncated multi-byte sequences with ICU
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.

It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.

Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
25490528af parser: Fix spurious error in SAX mode
Short-lived regression from 5f0b1378.
2025-03-11 16:34:30 +01:00
Nick Wellnhofer
5f0b1378d7 parser: Add more parser context accessors
Fixes #763.
2025-03-08 22:36:06 +01:00
Nick Wellnhofer
6bb2ea8e70 html: Adjust xmlDetectEncoding for HTML
Don't check for UTF-32 or EBCDIC.

We now perform BOM sniffing and the first step of the HTML5 prescan
algorithm (detect UTF-16 XML declarations). The rest of the algorithm
still has to be implemented.
2025-02-02 11:15:44 +01:00
Nick Wellnhofer
0de90f518d parser: Define SIZE_MAX 2025-01-30 01:25:31 +01:00
Nick Wellnhofer
3eced32ea3 parser: Fix push parser with encoding and single chunk
When push-parsing with an encoding handler, we must convert the whole
buffer in the initial conversion. Otherwise, parsing a single chunk
larger than ~4KB would fail.

Regressed with commit 34c9108f.
2025-01-30 00:02:34 +01:00
Nick Wellnhofer
1082d813e8 parser: Prepare to make decompression opt-in
Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
2025-01-29 00:49:57 +01:00
Nick Wellnhofer
a78843be5e xmllint: Support compressed input from stdin
Another regression related to reading from stdin.

Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.

This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.

Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
2025-01-28 23:20:37 +01:00
Nick Wellnhofer
2e3a91a766 doc: Fix documentation 2024-12-26 21:05:39 +01:00
Nick Wellnhofer
8231c03663 parser: Check reallocations for overflow 2024-12-21 19:37:37 +01:00
Nick Wellnhofer
0dd910e82b save: Fix handling of catastrophic errors
Don't overwrite catastrophic errors xmlSaveErr.

Overwrite non-catastrophic errors in xmlOutputBufferClose.
2024-12-19 02:30:36 +01:00
Nick Wellnhofer
1e1b48918c parser: Also raise error if ctxt is NULL
Update global error variable even if context is missing because of an
invalid (NULL) argument.
2024-12-13 17:57:11 +01:00