1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-26 21:41:34 +03:00
Commit Graph

456 Commits

Author SHA1 Message Date
Nick Wellnhofer
bc0bb67b57 html: Don't abort on encoding errors
Always enable recovery mode when parsing HTML, so we don't raise fatal
errors.

Regressed with 462bf0b7. Fixes #947.
2025-07-10 12:46:22 +02:00
Nick Wellnhofer
71e1e8af5e schematron: Fix memory safety issues in xmlSchematronReportOutput
Fix use-after-free (CVE-2025-49794) and type confusion (CVE-2025-49796)
in xmlSchematronReportOutput.

Fixes #931.
Fixes #933.
2025-07-04 14:44:54 +02:00
Nick Wellnhofer
24d7e15914 schematron: Complete fix for CVE-2025-49795
- Fix memory leaks
- Fix tests
2025-07-04 12:46:29 +02:00
Michael Mann
499bcb78ab Schematron: Fix null pointer dereference leading to DoS
(CVE-2025-49795)

Fixes #932
2025-07-04 09:35:14 +00:00
Michael Mann
069bcda17d Fix potential buffer overflows of interactive shell
CVE-2025-6170

Fixes #941
2025-07-02 13:29:19 -04:00
Omar Siam
9760a14fb9 relaxng: In the simplification step also unlink notAllowed refs from choice
This fixes false reports of non allowed content compared to notAllowed as tag within the choice tag.
2025-06-30 13:47:33 +00:00
Nick Wellnhofer
ad0f5d27c4 tree: Fix xmlGetNodePath
- Fix quadratic behavior
- Don't truncate names

Fixes #715.
2025-06-24 13:57:20 +02:00
Omar Siam
bb7169b5ad Fix relaxng is parsed to an infinite attrs->next loop
Test data for the bug.
2025-06-10 18:34:44 +02:00
Nick Wellnhofer
c8cea39d8a save: Fix serialization of attribute defaults containing <
Long-standing bug that produced invalid XML.
2025-05-11 20:29:25 +02:00
Nick Wellnhofer
05bd1720ce parser: Fix parsing of DTD content
Regressed in 2.11. Fixes #868.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
9f86dae989 test: Add test case for UAF in xmlSchemaIDCFillNodeTables 2025-02-20 11:35:47 +01:00
Nick Wellnhofer
71122421a1 html: Make implied <p> tags more deterministic
libxml2's HTML parser adds <p> start tags in some situations. This
behavior, which doesn't follow any standard, was added in 2000, see
here: http://veillard.com/XML/messages/0655.html

Text nodes that only contain whitespace don't imply a <p> tag, but the
whitespace check cannot work reliably if we're parsing partial text data
which can happen with both pull and push parser.

The logic in `areBlanks` is hard to follow. The checks involving `CUR`
depend on the position of the input pointer and seem dubious. It's also
possible that the behavior changed inadvertently with a later commit.
As a result, it's hard to come up with good test cases.

We now process leading whitespace before creating implied tags. This is
more in line with HTML5 and should avoid at least some issues with
partial text data.

For example, parsing the string "<head>   x" used to result in:

<html>
<head></head>
<body><p>   x</p></body>
</html>

And now results in:

<html>
<head>   </head>
<body><p>x</p></body>
</html>

Except for the implied <p> tag, this matches HTML5.
2025-02-13 14:31:44 +01:00
Nick Wellnhofer
b4d3d87ed2 parser: Fix parsing of doctype declarations
Fix some long-standing issues.

Fixes #504.
2025-02-02 11:15:45 +01:00
Nick Wellnhofer
459146140a xpath: Fix parsing of non-ASCII names
Fix a long-standing issue where QNames starting with a non-ASCII
character would be rejected. This became more visible after "streaming"
XPath evaluation was disabled since the latter handled non-ASCII names
correctly.

Fixes #818.
2024-11-05 12:30:44 +01:00
Nick Wellnhofer
ffb058f484 parser: Fix detection of duplicate attributes
We really need a second scan if more than one namespace clash was
detected.
2024-10-28 20:26:55 +01:00
Nick Wellnhofer
c6af101728 html: Test tokenizer against html5lib test suite 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
e1834745e0 html: Add character data tests 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
5951179239 html: Parse named character references according to HTML5 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
bd9eed4694 parser: Make unsupported encodings an error in declarations
This was changed in 45157261, but in encoding declarations, unsupported
encodings should raise a fatal error.

Fixes #794.
2024-09-02 19:29:39 +02:00
Nick Wellnhofer
8ae06d5223 SAX2: Don't merge CDATA sections
The Document Object Model (DOM) Level 3 Core Specification says:

> Adjacent CDATASection nodes are not merged by use of the normalize
> method of the Node interface.

Fixes #412.
2024-08-29 01:31:19 +02:00
Nick Wellnhofer
322e733b84 xinclude: Fix fallback for text includes
Fixes #772.
2024-07-18 19:32:23 +02:00
Nick Wellnhofer
30be984a0f encoding: Rework ISO-8859-X conversion
Optimize code. Pass tables as context parameter. Check for
XML_ENC_ERR_SPACE.
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
7c11da2d98 tests: Clarify licence of test/intsubset2.xml 2024-06-27 12:49:06 +02:00
Nick Wellnhofer
f06fc933cd tests: Move tests for executables to separate script
Move tests for xmllint shell and xmlcatalog to separate scripts and
enabled them in Autotools.
2024-06-22 21:59:03 +02:00
Nick Wellnhofer
1dd5e76a69 xinclude: Don't remove root element
Don't replace include element at root with empty nodeset.
2024-06-18 20:12:03 +02:00
Nick Wellnhofer
52ce0d70f9 tests: Add XInclude test for issue #733 2024-06-17 17:35:12 +02:00
Nick Wellnhofer
669bd34993 xpointer: Remove support for XPointer locations
The latest spec for what it essentially an XPath extension seems to be
this working draft from 2002:

    https://www.w3.org/TR/xptr-xpointer/

The xpointer() scheme is listed as "being reviewed" in the XPointer
registry since at least 2006. libxml2 seems to be the only modern
software that tries to implement this spec, but the code has many bugs
and quality issues.

If you configure --with-legacy, old symbols are retained for ABI
compatibility.
2024-06-12 18:20:01 +02:00
Nick Wellnhofer
651465f98c test: Remove unused test files 2024-04-24 22:50:53 +02:00
Nick Wellnhofer
45fe9924f0 parser: Don't create reference in xmlLookupGeneralEntity
This should only be done in xmlParseReference.

The handling of undeclared entities is still somewhat inconsistent. In
element content we create references even if entity substitution is
enabled. In attribute values undeclared entities are always ignored.
2024-04-23 18:36:15 +02:00
Nick Wellnhofer
f506ec6654 parser: Always decode entities in namespace URIs
Also decode entities in namespace URIs if entity substitution wasn't
requested. This should fix some corner cases when comparing namespace
URIs. The Namespaces in XML 1.0 spec says:

> In a namespace declaration, the URI reference is the normalized value
> of the attribute, so replacement of XML character and entity
> references has already been done before any comparison.

Make the serialization code escape special characters in namespace URIs
like in attribute values. This fixes serialization if entities were
substituted when parsing.

Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106
2024-04-15 12:34:26 +02:00
Seiya Nakata
5bb84b47b8 relaxng: Fix tree corruption in xmlRelaxNGParseNameClass
Don't create cycles in tree structure. This will lead to an infinite
loop or call stack overflow later.

Closes: https://gitlab.gnome.org/GNOME/libxml2/-/issues/711
2024-04-05 13:45:06 +02:00
Nick Wellnhofer
186562a182 parser: Fix detection of duplicate attributes in XML namespace
Fixes a regression from commit e0dd330b, resulting in duplicate
attributes in the predefined XML namespace not being detected or
extraneous default attributes being passed.

Fixes #704.
2024-03-12 20:02:52 +01:00
Nick Wellnhofer
f237e5b934 parser: Avoid duplicate namespace errors
Don't report an extra attribute uniqueness error if a namespace is
undeclared. This matches old behavior.
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
37c6618be5 parser: Rework parsing of attribute and entity values
Don't use a separate function to handle "complex" attributes. Validate
UTF-8 byte sequences without decoding. This should improve performance
considerably when parsing multi-byte UTF-8 sequences.

Use a string buffer to avoid unnecessary allocations and copying when
expanding entities.

Normalize attribute values in a single pass while expanding entities.

Be more lenient in recovery mode.

If no entity substitution was requested, validate entities without
expanding. Fixes #596.

Also fixes #655.
2024-01-02 15:42:03 +01:00
Nick Wellnhofer
d944a41515 parser: Fix in-parameter-entity and in-external-dtd checks
Use in ctxt->input->entity instead of ctxt->inputNr to determine whether
we are inside a parameter entity.

Stop using ctxt->external to check whether we're in an external DTD.
This is signaled by ctxt->inSubset == 2.
2023-12-29 01:19:56 +01:00
Nick Wellnhofer
b8313b589f xpath: Rewrite substring-before and substring-after
Don't use buffers. Check malloc failures.
2023-12-28 16:47:45 +01:00
Nick Wellnhofer
f3fa34dcad parser: Fix general entity parsing
Clear namespace database.

Ignore non-fatal errors.
2023-12-28 16:47:41 +01:00
Nick Wellnhofer
6e3a2ac660 xinclude: Rework xml:base fixup
The xml:base fixup was broken in more complex cases.

Also avoid parsing and building the included URI multiple times.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
f0df3e6d00 tests: Try to fix RelaxNG test cases
These were added recently in ea695ac0 and 8074b881 but were a total mess
of symbolic links and apparently mixed up files.

Symbolic links don't work on Windows.

Try to salvage one of the tests.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
83c6aeef49 relaxng: Improve error handling
Pass RelaxNG structured error handler to XML parser.

Handle malloc failure from xmlRaiseError.

Remove argument from memory error handler.

Use xmlRaiseMemoryError.

Don't use xmlGenericError.

Remove TODO macro.
2023-12-21 15:01:42 +01:00
Nick Wellnhofer
7d446e9736 parser: Fix namespaces redefined from default attributes
This regressed in commit e0dd330b.

Also fixes a long-standing issue where namespaces from default
attributes weren't added if they match an existing namespace.

Fixes #643.
2023-12-08 12:19:16 +01:00
Nick Wellnhofer
e395946194 html: Reenable buggy detection of XML declarations
Switch to UTF-8 if a document starts with '<?xm' to match old behavior.
Also enable this check in the push parser.

Fixes #637.
2023-11-30 16:22:59 +01:00
Nick Wellnhofer
43b511fa71 parser: Make CRLF increment line number
Partial revert of cb927e85 fixing CRLFs not incrementing the line
number.

This requires to rework xmlParseQNameHashed. The original implementation
prompted the change to xmlCurrentChar which really shouldn't modify the
'cur' pointer as side effect. But the NEXTL macro relies on this
behavior.

Ultimately, we should reintroduce the change to xmlCurrentChar and fix
the NEXTL macro. This will lead to single CRs incrementing the line
number as well which seems more consistent.

Fixes #628.
2023-11-26 15:18:09 +01:00
Nick Wellnhofer
a2b5c90a44 hash: Fix deletion of entries during scan
Functions like xmlCleanSpecialAttr scan a hash table and possibly delete
entries in the callback. xmlHashScanFull must detect such deletions and
rescan the entry.

This regressed when rewriting the hash table code in 4a513d56.

Fixes #626.
2023-11-21 15:28:59 +01:00
Nick Wellnhofer
7a2d412f68 parser: Copy default namespace in xmlParseBalancedChunkMemory 2023-10-31 20:19:27 +01:00
Nick Wellnhofer
e0c2f14d83 parser: Copy namespaces in xmlParseBalancedChunkMemory
Reenable copying of namespaces but don't set SAX data. This should
match the old behavior.
2023-10-31 14:04:57 +01:00
Nick Wellnhofer
b76d81dab3 parser: Fix regression when push parsing parameter entities
Short-lived regression from 834b8123.

Also shrink parameter entity buffers when push parsing.
2023-10-06 13:11:19 +02:00
Nick Wellnhofer
134d2ad890 parser: Protect against quadratic default attribute expansion 2023-10-06 12:47:24 +02:00
Nick Wellnhofer
0ba22c0513 parser: Support encoded external PEs in entity values
Corner case which was never supported.
2023-10-06 12:28:59 +02:00
Nick Wellnhofer
e48f3d8e0a tests: Add more tests for redefined attributes 2023-09-29 12:43:08 +02:00