1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00
Commit Graph

820 Commits

Author SHA1 Message Date
Nick Wellnhofer
bd9de3a31f malloc-fail: Fix null deref in xmlAddDefAttrs
Found with libFuzzer, see #344.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
33d4a0fe40 parser: Fix progress check in xmlParseExternalSubset
Avoid infinite loop. Short-lived regression from f61b8a62.

Found with libFuzzer.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
74aa61e0bd parser: Halt parser on DTD errors
If we try to continue parsing after an error in the internal or external
subset, entity expansion accounting gets more complicated. Simply halt
the parser.

Found with libFuzzer.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
d320a683d1 parser: Fix entity check in attributes
Don't set the "checked" flag when checking entities in default attribute
values. These entities could reference other entities which weren't
defined yet, so the check isn't reliable.

This fixes a short-lived regression which could lead to a call stack
overflow later in xmlStringGetNodeList.
2023-01-17 13:59:24 +01:00
Nick Wellnhofer
59b3366178 error: Limit number of parser errors
Reporting errors is expensive and some abusive test cases can generate
an error for each invalid input byte. This causes the parser to spend
most of the time with error handling. Limit the number of errors and
warnings to 100.
2022-12-27 14:41:19 +01:00
Nick Wellnhofer
66e9fd66e8 parser: Fix infinite loop with push parser in recovery mode
Short-lived regression from commit b1f9c193. Found by OSS-Fuzz.
2022-12-25 21:30:32 +01:00
Nick Wellnhofer
49b54d7e2b parser: Fix null deref in xmlStringDecodeEntitiesInt
Short-lived regression.
2022-12-25 15:06:51 +01:00
Nick Wellnhofer
1865668b61 parser: Fix accounting of consumed input bytes
Only add consumed bytes if

- we're not parsing an entity
- we're parsing external parameter entities for the first time.

Always ignore internal parameter entities.
2022-12-23 23:11:11 +01:00
Nick Wellnhofer
bc18f4a67c parser: Lower entity nesting limit with XML_PARSE_HUGE
The old limit of 1024 could lead to excessively deep call stacks. This
could probably be set much lower without causing issues.
2022-12-23 22:11:18 +01:00
Nick Wellnhofer
dd62e541ec parser: Don't increase depth twice when parsing internal entities
Fix xmlParseBalancedChunkMemoryInternal.
2022-12-23 22:11:18 +01:00
Nick Wellnhofer
a41b09c739 parser: Improve detection of entity loops
Set a flag to detect entity loops at once instead of processing until
the depth limit is exceeded.
2022-12-23 22:11:18 +01:00
Nick Wellnhofer
d972393f30 parser: Only report a single entity error
Don't report errors multiple times for nested entity references.
2022-12-23 22:10:39 +01:00
Nick Wellnhofer
077df27eb1 parser: Fix integer overflow of input ID
Applies a patch from Chromium. Also stop incrementing input ID of
subcontexts. This isn't necessary.

Fixes #465.
2022-12-22 15:22:01 +01:00
David Kilzer
0bd4e4e032 xmlParseStartTag2() contains typo when checking for default definitions for an attribute in a namespace
* parser.c:
(xmlParseStartTag2):
- Fix index into defaults->values.  It is only correct the first
  time through the loop when i == 0.

Fixes #467.
2022-12-21 19:35:33 -08:00
Nick Wellnhofer
b47ebf047e parser: Deprecate xmlString*DecodeEntities
These are internal functions.
2022-12-21 21:06:03 +01:00
Nick Wellnhofer
ec6633afae parser: Remove useless ent->etype test in xmlParseReference
If ent->etype is invalid, ret can't equal XML_ERR_OK.
2022-12-21 20:35:31 +01:00
Nick Wellnhofer
7ee7f0360a parser: Remove useless ent->children tests in xmlParseReference
The if-block before always returns if ent->children == NULL.
2022-12-21 20:35:31 +01:00
Nick Wellnhofer
ce76ebfd13 entities: Stop counting entities
This was only used in the old version of xmlParserEntityCheck.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
a3c8b1805e entities: Add entity flag for loop check 2022-12-21 20:19:10 +01:00
Nick Wellnhofer
463bbeeca1 entities: Rework entity amplification checks
This commit implements robust detection of entity amplification attacks,
better known as the "billion laughs" attack.

We now limit the size of the document after substitution of entities to
10 times the size before expansion. This guarantees linear behavior by
definition. There already was a similar check before, but the accounting
of "sizeentities" (size of external entities) and "sizeentcopy" (size of
all copies created by entity references) wasn't accurate.

We also need saturation arithmetic since we're historically limited to
"unsigned long" which is 32-bit on many platforms.

A maximum of 10 MB of substitutions is always allowed. This should make
use cases like DITA work which have caused problems in the past.

The old checks based on the number of entities were removed. This is
accounted for by adding a fixed cost to each entity reference.

Entity amplification checks are now enabled even if XML_PARSE_HUGE is
set. This option is mainly used to allow larger text nodes. Most users
were unaware that it also disabled entity expansion checks.

Some of the limits might be adjusted later. If this change turns out to
affect legitimate use cases, we can add a separate parser option to
disable the checks.

Fixes #294.
Fixes #345.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
7e3f469be9 entities: Use flags to store '<' check results
Instead of abusing the LSB of the "checked" member, store the result
of testing for occurrence of '<' character in "flags".

Also use the flags in xmlParseStringEntityRef instead of rescanning
every time.
2022-12-19 15:59:49 +01:00
Nick Wellnhofer
481d79d44c entities: Add XML_ENT_PARSED flag
To check whether an entity was already parsed, the code previously
tested whether "checked" was non-zero or "children" was non-null. The
"children" check could be unreliable because an empty entity also
results in an empty (NULL) node list. Use a separate flag to make this
check more reliable.
2022-12-19 15:26:46 +01:00
Alex Richardson
4b959ee168 Remove hacky heuristic from b2dc5675e9
Checking whether the context is close to the parent context by hardcoding
250 is not portable (I noticed tests were failing on Morello since the value
is 288 there due to pointers being 128 bits). Instead we should ensure
that the XML_VCTXT_USE_PCTXT flag is not set in cases where the user data
is not actually a parser context (or ideally add a separate field but that
would be an ABI break.

From what I can see in the source, the XML_VCTXT_USE_PCTXT is only set if
the userData field points to a valid context, and if this is not the case
the flag should be cleared when changing userData rather than relying on
the offset between the two. Looking at the history, I think
d7cb33cf44 fixed most of the need for this
workaround, but it looks like there are a few more locations that need
updating; This commit changes two more places to set/clear/copy the
XML_VCTXT_USE_PCTXT flag, so this heuristic should not be needed anymore.
I've also drop two = NULL assignment in xmllint since this is not needed
after a call to memset().

There was also an uninitialized vctxt.flags (and other fields) in
`xmlShellValidate()`, which I've fixed by adding a memset() call.
2022-12-01 15:31:25 +00:00
Alex Richardson
c62c0d82cc Correctly relocate internal pointers after realloc()
Adding an offset to a deallocated pointer and assuming that it can be
dereferenced is undefined behaviour. When running libxml2 on CHERI-enabled
systems such as Arm Morello this results in the creation of an out-of-bounds
pointer that cannot be dereferenced and therefore crashes at runtime.

The effect of this UB is not just limited to architectures such as CHERI,
incorrect relocation of pointers after realloc can in fact cause
FORTIFY_SOURCE errors with recent GCC:
https://developers.redhat.com/articles/2022/09/17/gccs-new-fortification-level
2022-12-01 15:14:40 +00:00
Nick Wellnhofer
c16fd705bb xpath: Make init function private 2022-11-27 02:11:07 +01:00
Nick Wellnhofer
53ab38408d encoding: Make init function private 2022-11-27 02:11:07 +01:00
Nick Wellnhofer
05c3a458aa tests: Check that xmlInitParser doesn't allocate memory 2022-11-27 02:11:07 +01:00
Nick Wellnhofer
78c0391bc7 parser: Register atexit handler in locked section 2022-11-25 15:12:56 +01:00
Nick Wellnhofer
ed053c50cf dict: Make init/cleanup functions private 2022-11-25 15:02:04 +01:00
Nick Wellnhofer
7010d8779b threads: Rework initialization
Make init/cleanup functions private. Merge xmlOnceInit into
xmlInitThreadsInternal.
2022-11-25 15:02:04 +01:00
Nick Wellnhofer
9dbf137455 parser: Make some module init/cleanup functions private 2022-11-25 15:02:04 +01:00
Nick Wellnhofer
cecd364dd2 parser: Don't call *DefaultSAXHandlerInit from xmlInitParser
Change the default handler definitions to match the result after calling
the initialization functions.

This makes sure that no thread-local variables are accessed when calling
xmlInitParser.
2022-11-25 15:02:04 +01:00
Nick Wellnhofer
b1f9c19383 parser: Fix push parser with unterminated CDATA sections
Short-lived regression found by OSS-Fuzz.
2022-11-22 21:39:01 +01:00
Nick Wellnhofer
0e193f0d61 parser: Remove dangerous check in xmlParseCharData
If this check succeeds, xmlParseCharData could be called over and over
again without making progress, resulting in an infinite loop.

It's only important to check for XML_PARSER_EOF which is done later.

Related to #441.
2022-11-21 22:09:19 +01:00
Nick Wellnhofer
94ca36c2c4 parser: Restore parser state in xmlParseCDSect
Fixes #441.
2022-11-21 22:07:11 +01:00
Nick Wellnhofer
a8b31e68c2 parser: Fix progress check when parsing character data
Skip over zero bytes to guarantee progress. Short-lived regression.
2022-11-21 21:39:10 +01:00
Nick Wellnhofer
c63900fbc1 parser: Check terminate flag when push parsing CDATA sections
Found by OSS-Fuzz.
2022-11-21 20:39:17 +01:00
Nick Wellnhofer
a781ee3395 Revert "parser: Add overflow checks to xmlParseLookup functions"
This reverts commit bfc55d6884.

It's better to fix the root cause.
2022-11-21 20:11:14 +01:00
Nick Wellnhofer
bfc55d6884 parser: Add overflow checks to xmlParseLookup functions
Short-lived regression found by OSS-Fuzz.
2022-11-21 18:29:54 +01:00
Nick Wellnhofer
9e4a46ace6 parser: Merge misc, prolog and epilog cases in push parser 2022-11-20 22:03:08 +01:00
Nick Wellnhofer
55fb8f72ac parser: Fix push parser with 1-3 byte initial chunk
Make sure that ctxt->charset is initialized properly.
2022-11-20 21:27:59 +01:00
Nick Wellnhofer
68a6518c45 parser: Rewrite push parser boundary checks
Remove inaccurate xmlParseCheckTransition check.

Remove non-incremental xmlParseGetLasts check.

Add functions that check for several boundary constructs more
accurately, keeping track of progress in ctxt->checkIndex.

Fixes #439.
2022-11-20 21:27:08 +01:00
Nick Wellnhofer
2059df5358 buf: Deprecate static/immutable buffers 2022-11-20 21:16:03 +01:00
Nick Wellnhofer
4955e0c9e1 io: Don't shrink memory input buffers 2022-11-20 21:16:03 +01:00
Nick Wellnhofer
117bab2256 parser: Don't call xmlSHRINK from push parser
xmlSHRINK also calls xmlParserInputGrow which isn't needed in the push
parser.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
f00739c12e parser: Ignore cdata argument in xmlParseCharData
It never could be used to parse CDATA sections.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
e4f56a7213 parser: Simplify xmlParseConditionalSections 2022-11-20 21:16:03 +01:00
Nick Wellnhofer
3582b07bd2 parser: Fix content parser progress checks
This is another attempt at fixing parser progress checks. Instead of
relying on in->consumed, which could overflow, change some content
parser functions to make guaranteed progress on certain byte sequences.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
f7ad338e09 parser: Fix attribute parser progress checks
This is another attempt at fixing parser progress checks. Instead of
relying on in->consumed, which could overflow, make the attribute parser
functions return a NULL name only if they don't make progress.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
f61b8a6233 parser: Fix DTD parser progress checks
This is another attempt at fixing parser progress checks. Instead of
relying on in->consumed, which could overflow, change some DTD parser
functions to make guaranteed progress on certain byte sequences.
2022-11-20 21:16:03 +01:00