1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-26 00:37:43 +03:00
Commit Graph

527 Commits

Author SHA1 Message Date
Nick Wellnhofer
3eb6bf0386 parser: Stop calling xmlParserInputGrow
Introduce xmlParserGrow which takes a parser context to simplify error
handling.
2023-03-12 17:05:51 +01:00
Nick Wellnhofer
53d1cc98cf malloc-fail: Fix error code in htmlParseChunk
Found with libFuzzer, see #344.
2023-02-17 17:18:51 +01:00
Nick Wellnhofer
15b0ed0815 malloc-fail: Fix infinite loop in htmlParseDocTypeDecl
Found with libFuzzer, see #344.
2023-02-17 17:18:47 +01:00
Nick Wellnhofer
041789d9ec malloc-fail: Fix null deref in htmlnamePush
Found with libFuzzer, see #344.
2023-02-17 17:18:43 +01:00
Nick Wellnhofer
0ec9c91064 malloc-fail: Fix infinite loop in htmlParseStartTag
Found with libFuzzer, see #344.
2023-02-17 17:18:38 +01:00
Nick Wellnhofer
04c2955197 malloc-fail: Fix infinite loop in htmlParseContentInternal
Found with libFuzzer, see #344.
2023-02-17 17:18:34 +01:00
Nick Wellnhofer
f3e62035d8 malloc-fail: Fix memory leak in htmlCreatePushParserCtxt
Found with libFuzzer, see #344.
2023-02-17 17:18:29 +01:00
Nick Wellnhofer
fc256953d2 malloc-fail: Fix memory leak in htmlCreateMemoryParserCtxt
Found with libFuzzer, see #344.
2023-02-17 17:18:25 +01:00
Nick Wellnhofer
643b4e90eb malloc-fail: Fix infinite loop in htmlParseStartTag
Found with libFuzzer, see #344.
2023-02-17 17:16:52 +01:00
Nick Wellnhofer
59b3366178 error: Limit number of parser errors
Reporting errors is expensive and some abusive test cases can generate
an error for each invalid input byte. This causes the parser to spend
most of the time with error handling. Limit the number of errors and
warnings to 100.
2022-12-27 14:41:19 +01:00
Alex Richardson
4b959ee168 Remove hacky heuristic from b2dc5675e9
Checking whether the context is close to the parent context by hardcoding
250 is not portable (I noticed tests were failing on Morello since the value
is 288 there due to pointers being 128 bits). Instead we should ensure
that the XML_VCTXT_USE_PCTXT flag is not set in cases where the user data
is not actually a parser context (or ideally add a separate field but that
would be an ABI break.

From what I can see in the source, the XML_VCTXT_USE_PCTXT is only set if
the userData field points to a valid context, and if this is not the case
the flag should be cleared when changing userData rather than relying on
the offset between the two. Looking at the history, I think
d7cb33cf44 fixed most of the need for this
workaround, but it looks like there are a few more locations that need
updating; This commit changes two more places to set/clear/copy the
XML_VCTXT_USE_PCTXT flag, so this heuristic should not be needed anymore.
I've also drop two = NULL assignment in xmllint since this is not needed
after a call to memset().

There was also an uninitialized vctxt.flags (and other fields) in
`xmlShellValidate()`, which I've fixed by adding a memset() call.
2022-12-01 15:31:25 +00:00
Alex Richardson
c715ded086 Avoid creating an out-of-bounds pointer by rewriting a check
Creating more than one-past-the-end pointers is undefined behaviour in C
and while this code is unlikely to be miscompiled, I discovered that an
out-of-bounds pointer is being created using UBSan on a CHERI-enabled
system.
2022-12-01 15:30:12 +00:00
Nick Wellnhofer
c7a9b85cbb html: Improve parsing of nested lists
Allow ul/ol as immediate children of ul/ol. This is more in line with
the HTML5 spec.

Fixes #447.
2022-11-30 17:11:33 +01:00
Nick Wellnhofer
e414f82585 html: Fix htmlInitAutoClose documentation 2022-11-27 02:11:07 +01:00
Nick Wellnhofer
c93679381c html: Fix check for end of comment in push parser
Make sure to reset checkIndex. Handle case where "--" or "--!" is at the
end of the buffer. Fix "avail" check in htmlParseOrTryFinish.
2022-11-20 21:27:59 +01:00
Nick Wellnhofer
68a6518c45 parser: Rewrite push parser boundary checks
Remove inaccurate xmlParseCheckTransition check.

Remove non-incremental xmlParseGetLasts check.

Add functions that check for several boundary constructs more
accurately, keeping track of progress in ctxt->checkIndex.

Fixes #439.
2022-11-20 21:27:08 +01:00
Nick Wellnhofer
6843fc726f Remove or annotate char casts 2022-09-01 04:31:30 +02:00
Nick Wellnhofer
2cac626976 Don't use sizeof(xmlChar) or sizeof(char) 2022-09-01 03:35:19 +02:00
Nick Wellnhofer
ad338ca737 Remove explicit integer casts
Remove explicit integer casts as final operation

- in assignments
- when passing arguments
- when returning values

Remove casts

- to the same type
- from certain range-bound values

The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.

Document some explicit casts as truncating and add a few missing ones.
2022-09-01 02:33:57 +02:00
Nick Wellnhofer
65dc8a63ac Make xmlNewSAXParserCtx take a const sax handler
Also improve documentation.
2022-09-01 00:17:45 +02:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
Nick Wellnhofer
58fc89e8a9 Deprecate internal parser functions 2022-08-25 21:04:57 +02:00
Nick Wellnhofer
a308c0cdf7 Deprecate old HTML SAX API 2022-08-25 21:04:57 +02:00
Nick Wellnhofer
9a82b94a94 Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt
Add API functions to create a parser context with a custom SAX handler
without having to mess with ctxt->sax manually.
2022-08-24 14:07:55 +02:00
Nick Wellnhofer
0a04db19fc Don't mess with parser options in htmlParseDocument
Don't set ctxt->html. This member should already be initialized.

Set ctxt->linenumbers in htmlCtxtUseOptions like the XML parser does.
2022-08-24 14:06:00 +02:00
Nick Wellnhofer
d45263a262 Remove useless call to htmlDefaultSAXHandlerInit
This function is already called from xmlInitParser.
2022-08-24 14:04:35 +02:00
Nick Wellnhofer
4b184240be Remove htmlDefaultSAXHandler from non-SAX1 build
This matches long-standing behavior of the XML counterpart.
2022-08-22 14:24:25 +02:00
Nick Wellnhofer
80bd34c3c6 Don't initialize SAX handler in htmlReadMemory
The SAX handler is already initialized when creating the parser
context.
2022-08-22 14:06:37 +02:00
Nick Wellnhofer
37cedc0b15 Fix htmlReadMemory mixing up XML and HTML functions
Also see fe6890e2.
2022-08-22 14:04:07 +02:00
Nick Wellnhofer
920753c4aa Don't use default SAX handler to report unrelated errors 2022-08-22 13:48:59 +02:00
Nick Wellnhofer
38f04779f7 Fix HTML parser with threads and --without-legacy
If the legacy functions are disabled, the default "V1" HTML SAX handler
isn't initialized in threads other than the main thread.
htmlInitParserCtxt would later use the empty V1 SAX handler, resulting
in NULL documents.

Change htmlInitParserCtxt to initialize the HTML SAX handler by calling
xmlSAX2InitHtmlDefaultSAXHandler. This removes the ability to change the
default handler but is more in line with the XML parser which
initializes the SAX handler by calling xmlSAXVersion, ignoring the V1
default handler.

Fixes #399.
2022-08-22 13:48:59 +02:00
Nick Wellnhofer
5b2d07a726 Use xmlStrlen in *CtxtReadDoc
xmlStrlen handles buffers larger than INT_MAX more gracefully.
2022-08-20 17:00:50 +02:00
Nick Wellnhofer
4ad71c2d72 Fix xmlCtxtReadDoc with encoding
xmlCtxtReadDoc used to create an input stream involving
xmlNewStringInputStream. This would create a stream without an input
buffer, causing problems with encodings (see #34).

After commit aab584dc3, an error was returned even with UTF-8 encodings
which happened to work before.

Make xmlCtxtReadDoc call xmlCtxtReadMemory which doesn't suffer from
these issues. Also fix htmlCtxtReadDoc.

Fixes #397.
2022-08-20 16:34:08 +02:00
Nick Wellnhofer
e986d09cf5 Skip incorrectly opened HTML comments
Commit 4fd69f3e fixed handling of '<' characters not followed by an
ASCII letter. But a '<!' sequence followed by invalid characters should
be treated as bogus comment and skipped.

Fixes #380.
2022-08-02 14:38:09 +02:00
Nick Wellnhofer
6722d22c88 Reduce indentation in HTMLparser.c
No functional change.
2022-08-02 14:38:09 +02:00
Nick Wellnhofer
a82ea25fc8 Also reset nsNr in htmlCtxtReset 2022-07-28 21:36:10 +02:00
David Kilzer
44e9118c02 Prevent integer-overflow in htmlSkipBlankChars() and xmlSkipBlankChars()
* HTMLparser.c:
(htmlSkipBlankChars):
* parser.c:
(xmlSkipBlankChars):
- Cap the return value at INT_MAX.
- The commit range that OSS-Fuzz listed for the fix didn't make
  any changes to xmlSkipBlankChars(), so it seems like this
  issue may still exist.

Found by OSS-Fuzz Issue 44803.
2022-04-11 18:09:37 +00:00
Nick Wellnhofer
40483d0ce2 Deprecate module init and cleanup functions
These functions shouldn't be part of the public API. Most init
functions are only thread-safe when called from xmlInitParser. Global
variables should only be cleaned up by calling xmlCleanupParser.
2022-03-06 15:59:43 +01:00
Nick Wellnhofer
ebb1797030 Remove unneeded #includes 2022-03-04 22:11:49 +01:00
Mike Dalessio
d7b287b94c htmlParseComment: handle abruptly-closed comments
See guidance provided on abrutply-closed comments here:

https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment
2022-03-02 14:42:47 +00:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
4fd69f3e27 Fix recovery from invalid HTML start tags
Only try to parse a start tag if there's a '<' followed by an ASCII
letter. This is more in line with HTML5 and the old behavior in
recovery mode. Emit a literal '<' if the following character is
invalid.

Fixes #101.
Fixes #339.
2022-02-22 18:41:00 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
d7cb33cf44 Rework validation context flags
Use a bitmask instead of magic values to

- keep track whether the validation context is part of a parser context
- keep track whether xmlValidateDtdFinal was called

This allows to add addtional flags later.

Note that this deliberately changes the name of a public struct member,
assuming that this was always private data never to be used by client
code.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
96dc7f4ae6 Also register HTML document nodes
Fixes #196.
2022-02-01 16:38:29 +01:00
Finn Barber
fe6890e292 Fix htmlReadFd, which was using a mix of xml and html context functions 2022-01-16 15:31:54 +01:00
David King
e7d1c53a49 Fix memory leak in xmlFreeParserInputBuffer
Found by Coverity.

https://bugzilla.redhat.com/show_bug.cgi?id=1938806
2022-01-16 14:10:34 +01:00
Nick Wellnhofer
798bdf13f6 Different approach to fix quadratic behavior in HTML push parser
The old approach introduced a regression, see issue #312 and the
previous commit. Disable code that tries to recover from invalid start
tags. This only affects "recovery" mode.

Add a comment outlining a better fix in accordance with the HTML5 spec.
2022-01-10 14:50:20 +01:00
Nick Wellnhofer
094fc08a09 Fix regression when parsing invalid HTML tags in push mode
Revert part of commit 173a0830 that changed behavior when parsing
malformed start tags with the push parser. This reintroduces quadratic
behavior in recovery mode which will be worked around in the next
commit.

Fixes #312.
2022-01-10 14:49:00 +01:00
Nick Wellnhofer
2732b23466 Fix regression parsing public IDs literals in HTML
Fix regression introduced when reworking htmlParsePubidLiteral in
commit 93ce33c2.

Fixes #318.
2022-01-10 13:37:59 +01:00