Michael Mann
cf4f967266
Add XML_PARSE_SKIP_IDS to replace XML_SKIP_IDS
...
Mark loadset member as deprecated
Fixes #873
2025-06-22 08:03:34 -04:00
Nick Wellnhofer
a3992815b3
parser: Fix buffer overflow when parsing PublicIds
...
Regressed with 8231c0366
and 30665ae4
.
2025-06-12 13:51:37 +02:00
Nick Wellnhofer
30665ae4d1
parser: Fix parsing of PublicIds and VersionNums
...
Regressed in 8231c0366
.
Fixes #940 .
2025-06-11 18:36:50 +02:00
Nick Wellnhofer
416da89d0b
html: Make htmlCtxtReset call xmlCtxtReset
...
The two implementations shouldn't diverge.
2025-06-08 14:22:32 +02:00
Alex Richardson
7e4247b278
parser: use XML_INT_TO_PTR when storing integers as pointers
...
This fixes warnings when using a CHERI-aware toolchain.
2025-06-06 12:11:54 -07:00
Nick Wellnhofer
2b6b3945f2
Revert "SAX1: Align handling of default attributes with SAX2"
...
This reverts commit db65b2fc51
.
This didn't check for duplicate default attributes.
2025-06-03 16:21:56 +02:00
Nick Wellnhofer
30375877d9
parser: Fix custom SAX parsers without cdataBlock handler
...
Use characters handler if cdataBlock handler is NULL.
Regressed with 57e4bbd8
. Should fix #934 .
2025-06-03 16:21:48 +02:00
Nick Wellnhofer
479f26f92f
regexp: Remove unfinished reimplementation
...
This was never enabled.
2025-06-03 00:28:16 +02:00
Nick Wellnhofer
0f8543e11d
parser: Fix error reporting in xmlSkipBlankCharsPEBalanced
...
Short-lived regression.
2025-06-02 14:19:01 +02:00
Nick Wellnhofer
6a6a46f017
doc: Fix autolink errors
...
Fix links, remove links to internal functions.
2025-05-28 16:02:41 +02:00
Nick Wellnhofer
7bd8d1d9cc
doc: Prefix autolinks with '#'
...
Use `#func` instead of `func()` to ignore parameters and make all
autolinks work.
2025-05-28 16:01:52 +02:00
Nick Wellnhofer
8baa5de182
parser: Avoid integer overflow in xmlParseCharDataInternal
...
`nbchar` could overflow with larger than 2GB memory buffers which some
new APIs allow. This shouldn't affect memory safety.
Limit maximum amount of bytes passed to character callback to
XML_MAX_ITEMS (1e9).
2025-05-27 20:03:13 +02:00
Nick Wellnhofer
ab06bfa1f6
parser: Fix error return in xmlParseElementContentDecl
...
Avoid internal error later in xmlValidBuildAContentModel after
2a60ca06c
.
Also avoids some unnecessary error messages.
2025-05-26 16:51:59 +02:00
Nick Wellnhofer
4dc44c83ab
parser: Rework entity boundary check for element content
...
Only use depth of input stack. This makes the input ID unused
internally.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
74ea6b483c
parser: Start using input depth for entity boundary check
...
Now that we make sure that PEs starting markup won't be popped
implicitly, it's enough to check that no new entities are on the stack
when checking boundaries.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
db65b2fc51
SAX1: Align handling of default attributes with SAX2
...
The SAX1 parser is legacy code, but it seems more maintainable to align
it with SAX2.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
e4cbc295fa
parser: Check attribute normalization standalone constraint
...
To fully implement "VC: Standalone Document Declaration", we have to
check for normalization changes caused by non-CDATA attribute types
declared externally.
Fixes #119 .
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
682195c869
parser: Fix "Proper Declaration/PE Nesting" validity constraint
...
Now that we handle "WFC: PE Between Declarations" correctly, we can turn
"Proper Declaration/PE Nesting" from a WFC into VC as specified.
Fixes #118 .
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
2f3655c9c3
parser: Pop PEs that start markup declarations explicitly
...
We currently only handle "Validity constraint: Proper Declaration/PE
Nesting", but we must detect "Well-formedness constraint: PE Between
Declarations" separately:
> The replacement text of a parameter entity reference in a DeclSep must
> match the production extSubsetDecl.
PEs in DeclSeps are PEs that start with a full markup declaration (or
another PE). These are handled in xmParse{Internal|External}Subset. We
set a flag on these PEs and don't close them implicitly in
xmlSkipBlankCharsPE. This will make unterminated declarations in such
PEs cause a parser error. The PEs are closed explicitly in
xmParse{Internal|External}Subset, the only location where they are
allowed to end.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
2a60ca06c0
valid: Don't check enum values
...
Rely on the parser to pass valid arguments.
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
dd1961e0d8
valid: Skip more validity checks if not validating
2025-05-25 14:26:30 +02:00
Nick Wellnhofer
47aca2c6c9
parser: Only check validity contraints when validating
2025-05-19 20:07:54 +02:00
Nick Wellnhofer
172550d225
parser: Only validate EnumerationTypes when requested
...
This has quadratic behavior and is only a validity constraint.
2025-05-19 19:58:33 +02:00
Nick Wellnhofer
7008740a96
parser: Consolidate scanning of XML Names
...
Use new productions by default.
Fixes #194 .
Fixes #364 .
See #707 .
2025-05-19 19:58:33 +02:00
Nick Wellnhofer
657254a87f
parser: Factor out xmlIsNameCharNew/Old
2025-05-18 01:23:25 +02:00
Nick Wellnhofer
c5b45fbc07
doc: Misc fixes
2025-05-16 19:04:20 +02:00
Nick Wellnhofer
6f4b452742
parser: Stop using ctxt->linenumbers
...
I think this was used to avoid setting the `line` member before it was
added (20+ years ago).
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
adfbeb7e08
doc: Stop using *Ptr typedefs in documentation
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
a40f36e7f2
include: Stop using *Ptr typedefs in public headers
2025-05-16 18:03:12 +02:00
Nick Wellnhofer
442c1903af
doc: Fix some damage from automated conversions
...
Add some newlines, fix returns.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
ad390a5d14
parser: Set doc properties in endDocument SAX handler
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
9bbffec568
doc: Move brief to top, params to bottom of doc comments
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
1bf44f09ba
doc: Misc fixes to parser docs
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
4a01087585
doc: Move parser option docs to enum
2025-05-06 19:51:38 +02:00
Nick Wellnhofer
cb1635a642
doc: Use @since command
2025-05-02 19:05:25 +02:00
Nick Wellnhofer
e78e05c990
doc: Fix autolinks to functions
...
Unfortunately, autolinks in .c files aren't converted by Doxygen for
some reason.
2025-05-02 17:45:31 +02:00
Nick Wellnhofer
f7c412874b
doc: Remove more comment block headers
2025-05-02 17:41:26 +02:00
Nick Wellnhofer
1eca6e3476
parser: Deprecate xmlClearParserCtxt
2025-05-02 13:33:35 +02:00
Nick Wellnhofer
e525564f65
doc: Remove empty lines at start of block
...
These lines were left over after automatic conversion.
2025-05-02 11:42:05 +02:00
Nick Wellnhofer
e549622bc5
doc: Convert documentation to Doxygen
...
Automated conversion based on a few regexes.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
69879da88f
doc: Remove email addresses from documentation
...
Also remove authorship information from generated files, hash.c and
globals.c which were rewritten.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
61890e399d
doc: Prepare for conversion to Doxygen
...
Fix many params in internal functions (not really necessary but Doxygen
warns about that in XML mode).
Fix formatting in a few corner cases that automatic conversion can't
handle.
Rearrange some DOC_DISABLE blocks.
2025-05-01 23:23:42 +02:00
Nick Wellnhofer
0bac84b1bd
Add missing NULL checks to public API functions
2025-04-25 13:15:29 +02:00
Nick Wellnhofer
72906f161c
parser: Make undeclared entities in XML content fatal
...
When parsing XML content with functions like xmlParseBalancedChunk or
xmlParseInNodeContext, make undeclared entities always a fatal error to
match 2.13 behavior.
This was deliberately changed in 4f329dc5
, probably to make the tests
pass.
Should fix #895 .
2025-04-25 13:15:29 +02:00
Nick Wellnhofer
b85d77d156
http: Remove built-in HTTP client
...
Stubs are retained for ABI compatibility.
Fixes #631 .
Obsoletes #160 .
2025-04-20 18:21:06 +02:00
Nick Wellnhofer
a5c4a6efe7
parser: Fix XML_PARSE_NOBLANKS dropping non-whitespace text
...
Regressed with 1f5b5371
.
Fixes #884 .
2025-03-28 16:52:34 +01:00
Nick Wellnhofer
69b83bb68e
encoding: Detect truncated multi-byte sequences with ICU
...
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.
It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.
Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
8696ebe182
parser: Fix ignorableWhitespace callback
...
If ignorableWhitespace differs from the "characters" callback, we have
to check for blanks as well.
Regressed with 1f5b537
.
2025-03-11 16:34:30 +01:00
Nick Wellnhofer
25490528af
parser: Fix spurious error in SAX mode
...
Short-lived regression from 5f0b1378
.
2025-03-11 16:34:30 +01:00
Nick Wellnhofer
5f0b1378d7
parser: Add more parser context accessors
...
Fixes #763 .
2025-03-08 22:36:06 +01:00