1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00
Commit Graph

151 Commits

Author SHA1 Message Date
Nick Wellnhofer
05d9bacd05 regexp: Improve error handling
Handle malloc failure from xmlRaiseError.

Use xmlRaiseMemoryError.

Remove argument from memory error handler.

Remove TODO macro.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
1a354d5b30 regexp: Report malloc failures
Fix places where malloc failures aren't reported.
2023-12-11 22:13:05 +01:00
Nick Wellnhofer
3e7673bc2d malloc-fail: Report malloc failure in xmlFARegExec 2023-09-29 00:15:40 +02:00
Nick Wellnhofer
b7d56ef7f1 malloc-fail: Report malloc failure in xmlRegEpxFromParse
Also check whether malloc failures are reported when fuzzing.
2023-09-22 19:53:11 +02:00
Nick Wellnhofer
f98fa86318 regexp: Fix status codes and handle invalid UTF-8
Fixes #561.
2023-09-22 19:01:11 +02:00
Nick Wellnhofer
4e1c13ebfd debug: Remove debugging code
This is barely useful these days and only clutters the code base.
2023-09-19 17:35:09 +02:00
Nick Wellnhofer
a800b7e058 regexp: Fix null deref in xmlFAFinishReduceEpsilonTransitions
Short-lived regression found by OSS-Fuzz.
2023-05-04 12:47:00 +02:00
Nick Wellnhofer
c613ab14b8 regexp: Fix mistake in previous commit
The `ret = 0` line should have been deleted.

Fixes #531.
2023-05-02 00:32:50 +02:00
Nick Wellnhofer
a06eaa6119 regexp: Fix determinism checks
Swap arguments in initial call to xmlFARecurseDeterminism.

Fix the check whether we revisit the initial state in
xmlFARecurseDeterminism.

If there are transitions with equal atoms and targets but different
counters, treat the regex as deterministic but mark the transitions as
non-deterministic internally.

Don't overwrite zero return value of xmlFAComputesDeterminism
with non-zero value from xmlFARecurseDeterminism.

Most of these errors lead to non-deterministic regexes not being
detected which typically isn't an issue. The improved code may break
users who relied on buggy behavior or cause other bugs to become
visible.

Fixes #469.
2023-04-30 22:37:11 +02:00
Nick Wellnhofer
e301865e69 regexp: Fix checks for eliminated transitions
'to' can be set to -1 or -2 when eliminating transitions, so check for
all negative values.
2023-04-30 22:36:51 +02:00
Nick Wellnhofer
90759c598d regexp: Simplify xmlFAReduceEpsilonTransitions 2023-04-30 22:36:41 +02:00
Nick Wellnhofer
9f7b114232 regexp: Fix cycle check in xmlFAReduceEpsilonTransitions
The visited flag must only be reset after the first call to
xmlFAReduceEpsilonTransitions has finished. Visiting states multiple
times could lead to unnecessary processing of duplicate transitions.

Similar to 68eadabd.
2023-04-30 22:36:33 +02:00
Nick Wellnhofer
85057e5131 regexp: Add sanity check in xmlRegCalloc2
These arguments should be non-zero, but add a sanity check to avoid
division by zero.

Fixes #450.
2023-02-21 15:43:32 +01:00
Nick Wellnhofer
1743c4c3fc malloc-fail: Fix OOB read after xmlRegGetCounter
Found with libFuzzer, see #344.
2023-02-17 17:18:59 +01:00
Nick Wellnhofer
40bc1c699a malloc-fail: Fix memory leak in xmlFAParseCharProp
Found with libFuzzer, see #344.
2023-02-17 17:18:55 +01:00
Nick Wellnhofer
e64653c0e7 malloc-fail: Fix leak of xmlRegAtom
Found with libFuzzer, see #344.
2023-02-17 17:18:55 +01:00
Nick Wellnhofer
ed615967df malloc-fail: Fix memory leak in xmlRegexpCompile
Found with libFuzzer, see #344.
2023-02-17 17:18:55 +01:00
Nick Wellnhofer
e60c9f4c4b malloc-fail: Fix memory leak after xmlRegNewState
Invoke xmlRegNewState from xmlRegStatePush to simplify error handling.

Found with libFuzzer, see #344.
2023-02-17 17:16:51 +01:00
Nick Wellnhofer
bd33331bb9 regexp: Simplify xmlRegAtomPush 2023-02-17 17:16:50 +01:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
Nick Wellnhofer
145170125a Fix parsing of subtracted regex character classes
Fixes #370.
2022-04-23 19:22:42 +02:00
Nick Wellnhofer
ebb1797030 Remove unneeded #includes 2022-03-04 22:11:49 +01:00
Damjan Jovanovic
37ebf8a8b2 Document support for the non-standard escape sequences.
Support non-BMP code points in surrogate pairs of '\uXXXX\uXXXX'.
2022-03-02 15:25:21 +00:00
Damjan Jovanovic
b66c19612c Use strtoul() instead of sscanf, and correct data types that break GCC. 2022-03-02 15:25:21 +00:00
Damjan Jovanovic
ec8ff95ce3 Add support for some non-standard escapes in regular expressions.
This adds support for some non-standard escape sequences observed
in Microsoft's MSXML DLLs and used by Windows apps, and thus
needed by Wine. Some are also used in other XML implementations,
eg. Java's.

This isn't intended to be final. We probably wish to toggle these
non-standard escape sequences on and off somehow, as needed by
the caller.

Further discussion: https://gitlab.gnome.org/GNOME/libxml2/-/issues/260
2022-03-02 15:25:21 +00:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
ea6e8f998d Fix certain combinations of regex range quantifiers
Fix regex transitions that have both min/max and a counter. In this
case, we want to save the regex state before incrementing the counter.

Fixes #301 and the issue reported here:

https://mail.gnome.org/archives/xml/2016-April/msg00017.html
2022-02-28 16:56:02 +01:00
Nick Wellnhofer
382fb056b5 Fix range quantifier on subregex
Make sure to add counted exit transitions before other counter
transitions. Otherwise, we won't backtrack correctly.

Fixes #65.
2022-02-28 16:56:02 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Arne Becker
ec6e3efb06 Patch to forbid epsilon-reduction of final states
When building the internal representation of a regexp, it is possible
that a lot of empty transitions are created. Therefore there is a step
to reduce them in the function xmlFAEliminateSimpleEpsilonTransitions.

There is an error there for this case:

* State 1 has a transition with an atom (in this case "a") to state 2.
* State 2 is final and has an epsilon transition to state 1.

After reduction it looked like:
* State 1 has a transition with an atom (in this case "a") to itself
  and is final.

In other words, the empty string is accepted when it shouldn't be.

The attached patch skips the reduction step for final states.
An alternative would be to insert or increment counters when reducing a
final state, but this seemed error prone and unnecessary, since there
aren't that many final states.

Fixes #282
2021-07-06 21:59:25 +02:00
Nick Wellnhofer
7d6837ba0e Fix caret in regexp character group
Apply Per Hedeland's patch from

    https://bugzilla.gnome.org/show_bug.cgi?id=779751

Fixes #188.
2020-10-25 20:21:43 +01:00
Nick Wellnhofer
68eadabd00 Fix exponential runtime in xmlFARecurseDeterminism
In order to prevent visiting a state twice, states must be marked as
visited for the whole duration of graph traversal because states might
be reached by different paths. Otherwise state graphs like the
following can lead to exponential runtime:

  ->O-->O-->O-->O-->O->
     \ / \ / \ / \ /
      O   O   O   O

Reset the "visited" flag only after the graph was traversed.

xmlFAComputesDeterminism still has massive performance problems when
handling fuzzed input. By design, it has quadratic time complexity in
the number of reachable states. Some issues might also stem from
redundant epsilon transitions. With this fix, fuzzing regexes with a
maximum length of 100 becomes feasible at least.

Found with libFuzzer.
2020-07-31 11:55:13 +02:00
Nick Wellnhofer
fc842f6eba Limit regexp nesting depth
Enforce a maximum nesting depth of 50 for regular expressions. Avoids
stack overflows with deeply nested regexes.

Found by OSS-Fuzz.
2020-07-06 15:22:12 +02:00
Nick Wellnhofer
f8329fdc23 Report error for invalid regexp quantifiers 2020-07-02 11:54:28 +02:00
Nick Wellnhofer
1e7851b5ae Fix integer overflow in xmlFAParseQuantExact
Found by OSS-Fuzz.
2020-06-25 12:18:21 +02:00
Nick Wellnhofer
20c60886e4 Fix typos
Resolves #133.
2020-03-08 17:41:53 +01:00
Nick Wellnhofer
52649b63eb Check for overflow when allocating two-dimensional arrays
Found by lgtm.com
2020-01-02 15:24:23 +01:00
Nick Wellnhofer
9bd7abfba4 Remove useless comparisons
Found by lgtm.com
2020-01-02 14:14:48 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
99a864a1f7 Fix Regextests
- One of the bug316338 test cases is expected to succeed.
- Memory leak in testRegexp.c.
- Refcount handling in xmlExpHashGetEntry.
2019-09-25 15:27:45 +02:00
Nick Wellnhofer
c2b0a184a9 Fix empty branch in regex
Fixes bug 649244:
https://bugzilla.gnome.org/show_bug.cgi?id=649244

Closes #57.
2019-09-25 14:22:47 +02:00
Nick Wellnhofer
e8c9cd5c7a Fix Schema determinism check of ##other namespaces
Non-compound (##local) and compound string atoms are always disjoint
regardless of whether the compound atom is negated (##other).

Closes #40.
2019-09-16 15:36:02 +02:00
zhouzhongyuan
0b793591ac Fix memory leak in xmlRegEpxFromParse
Merge request !39
2019-09-13 15:37:56 +02:00
Nick Wellnhofer
09797c139e Fix null deref in xmlregexp error path
Thanks to Shaobo He for the report.
2019-03-05 15:14:34 +01:00
J. Peter Mugaas
d2c329a9a4 Fix -Wimplicit-fallthrough warnings
Add "falls through" comments to quench implicit-fallthrough warnings
which are enabled by -Wextra under GCC 7.
2017-10-21 13:49:31 +02:00
David Kilzer
fb56f80eef Heap-buffer-overflow read of size 1 in xmlFAParsePosCharGroup
Credit to OSS-Fuzz.

Add a check to xmlFAParseCharRange() for the end of the buffer
to prevent reading past the end of it.

This fixes Bug 784017.
2017-07-04 18:51:29 +02:00
Nick Wellnhofer
8a0c66986e Fix NULL pointer deref in xmlFAParseCharClassEsc
Found with libFuzzer.
2017-07-04 18:51:29 +02:00
Nick Wellnhofer
34e445674d Fix undefined behavior in xmlRegExecPushStringInternal
It's stupid, but the behavior of memcpy(NULL, NULL, 0) is undefined.
2017-06-01 14:31:27 +02:00
Pranjal Jumde
cbb271655c Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup <https://bugzilla.gnome.org/show_bug.cgi?id=757711>
* xmlregexp.c:
(xmlFAParseCharRange): Only advance to the next character if
there is no error.  Advancing to the next character in case of
an error while parsing regexp leads to an out of bounds access.
2016-05-23 15:01:07 +08:00
Daniel Veillard
34b350048d Fix an error with regexp on nullable counted char transition
This is the first of the two issues raised by Pete Cordell
in https://mail.gnome.org/archives/xml/2016-April/msg00030.html
2016-05-09 09:28:38 +08:00