1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00
Commit Graph

755 Commits

Author SHA1 Message Date
Nick Wellnhofer
58fc89e8a9 Deprecate internal parser functions 2022-08-25 21:04:57 +02:00
Nick Wellnhofer
34a050cdee Move some HTML functions to correct header file 2022-08-24 16:44:39 +02:00
Nick Wellnhofer
fd85b566f7 Mark more parser functions as deprecated
No compiler warnings generated yet.
2022-08-24 15:12:24 +02:00
Nick Wellnhofer
0e49f8826a Mark most SAX1 functions as deprecated
No compiler warnings generated yet.
2022-08-24 14:07:57 +02:00
Nick Wellnhofer
9a82b94a94 Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt
Add API functions to create a parser context with a custom SAX handler
without having to mess with ctxt->sax manually.
2022-08-24 14:07:55 +02:00
Nick Wellnhofer
5b2d07a726 Use xmlStrlen in *CtxtReadDoc
xmlStrlen handles buffers larger than INT_MAX more gracefully.
2022-08-20 17:00:50 +02:00
Nick Wellnhofer
4ad71c2d72 Fix xmlCtxtReadDoc with encoding
xmlCtxtReadDoc used to create an input stream involving
xmlNewStringInputStream. This would create a stream without an input
buffer, causing problems with encodings (see #34).

After commit aab584dc3, an error was returned even with UTF-8 encodings
which happened to work before.

Make xmlCtxtReadDoc call xmlCtxtReadMemory which doesn't suffer from
these issues. Also fix htmlCtxtReadDoc.

Fixes #397.
2022-08-20 16:34:08 +02:00
Nick Wellnhofer
5930fe0196 Reset nsNr in xmlCtxtReset 2022-07-18 20:59:45 +02:00
Nick Wellnhofer
ca2c91f139 Fix memory leak in xmlLoadEntityContent error path
Free the input stream if pushing it fails.

Found by OSS-Fuzz.

https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43743
2022-06-28 19:33:48 +02:00
Nick Wellnhofer
ecba4cbd43 Avoid double-free if malloc fails in inputPush
It's the caller's responsibility to free the input stream if this
function fails.
2022-06-28 19:33:40 +02:00
Nick Wellnhofer
3e7b4f37aa Avoid calling xmlSetTreeDoc
Create text nodes with xmlNewDocText or set the document directly to
avoid xmlSetTreeDoc being called when the node is inserted.
2022-06-20 01:49:39 +02:00
David Kilzer
44e9118c02 Prevent integer-overflow in htmlSkipBlankChars() and xmlSkipBlankChars()
* HTMLparser.c:
(htmlSkipBlankChars):
* parser.c:
(xmlSkipBlankChars):
- Cap the return value at INT_MAX.
- The commit range that OSS-Fuzz listed for the fix didn't make
  any changes to xmlSkipBlankChars(), so it seems like this
  issue may still exist.

Found by OSS-Fuzz Issue 44803.
2022-04-11 18:09:37 +00:00
David Kilzer
21561e833a Mark more static data as const
Similar to 8f5710379, mark more static data structures with
`const` keyword.

Also fix placement of `const` in encoding.c.

Original patch by Sarah Wilkin.
2022-04-07 12:01:23 -07:00
Nick Wellnhofer
92bff86614 Fix calls to deprecated init/cleanup functions
Only use xmlInitParser/xmlCleanupParser.
2022-03-29 14:18:31 +02:00
Nick Wellnhofer
9684954429 Revert "Continue to parse entity refs in recovery mode"
This reverts commit 84823b8634 which
exposed several other, potentially serious bugs.

Fixes #356.
2022-03-22 19:11:05 +01:00
Nick Wellnhofer
7d02c7291f Fix parser progress checks
Testing the current input pointer for modification is unreliable since
the input buffer could have been freed and realloced. Check whether the
input id and the up-to-date number of bytes consumed match.
2022-03-06 02:33:01 +01:00
Nick Wellnhofer
84823b8634 Continue to parse entity refs in recovery mode
There doesn't seem to be a good reason to abort in xmlParseReference
if a well-formedness error was detected. Removing this check allows to
parse entity references after an error in recovery mode.

Fixes #270.
2022-03-06 02:26:22 +01:00
Nick Wellnhofer
d99ddd9bd5 Improve buffer allocation scheme
In most places, we really need the double-it scheme to avoid quadratic
behavior. The hybrid scheme still can cause many reallocations and the
bounded scheme doesn't seem to provide meaningful protection in
xmlreader.c.
2022-03-06 02:26:22 +01:00
Nick Wellnhofer
ebb1797030 Remove unneeded #includes 2022-03-04 22:11:49 +01:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
89d9ef3ee8 Reset last error in xmlCleanupGlobals
Before, we tried to reset the last error in xmlCleanupParser. But if
xmlCleanupParser wasn't called from the main thread, this would reset
the thread-local error object. xmlCleanupGlobals has access to the
error object of the main thread and can reset it reliably.
2022-03-01 15:14:00 +01:00
Nick Wellnhofer
2489c1d024 Remove useless __CYGWIN__ checks
From what I can tell, some really early Cygwin versions from around
1998-2000 used to erroneously define _WIN32. This was eventually fixed,
but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is
unnecessary.

Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether
to use __declspec.
2022-02-28 22:58:35 +01:00
Nick Wellnhofer
c41bc10da3 Fix unused variable warnings with disabled features 2022-02-22 19:57:12 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
9edc20c154 Fix double counting of CRLF in comments
Fixes #151.
2022-02-07 20:54:07 +01:00
Nick Wellnhofer
9653565765 Make sure to grow input buffer in xmlParseMisc
Otherwise, large amount of whitespace could lead to documents not
being parsed correctly.

Fixes #299.
2022-02-07 15:43:36 +01:00
Nick Wellnhofer
d85245f934 Fix regression with PEs in external DTD
Fix a regression introduced with commit a28f7d87. In some cases,
parameter entity references in external DTDs wouldn't be expanded.

Fixes #306.
2022-01-16 21:56:10 +01:00
Yulin Li
46c658b025 move current position before possible calling of ctxt->sax->characters. 2022-01-16 15:03:12 +01:00
David King
fe564967c9 Fix memory leak in xmlCreateIOParserCtxt
Found by Coverity.

https://bugzilla.redhat.com/show_bug.cgi?id=1938806
2022-01-16 14:14:32 +01:00
Mike Dalessio
a7b9f3ebdf fix: avoid segfault at exit when using custom memory functions
This extends the fix introduced by 956534e to Windows processes
dynamically loading libxml2.

Closes #256.
2021-05-20 13:38:54 -04:00
Daniel Veillard
8598060bac Patch for security issue CVE-2021-3541
This is relapted to parameter entities expansion and following
the line of the billion laugh attack. Somehow in that path the
counting of parameters was missed and the normal algorithm based
on entities "density" was useless.
2021-05-13 14:55:12 +02:00
Nick Wellnhofer
bfd2f4300f Fix null deref in legacy SAX1 parser
Always call nameNsPush instead of namePush. The latter is unused now
and should probably be removed from the public API. I can't see how
it could be used reasonably from client code and the unprefixed name
has always polluted the global namespace.

Fixes a null pointer dereference introduced with de5b624f when parsing
in SAX1 mode.

Found by OSS-Fuzz.
2021-05-09 19:03:16 +02:00
Nick Wellnhofer
ce00c36e65 Store per-element parser state in a struct
Make the parser context's "pushTab" point to an array of structs
instead of void pointers. This avoids casting unrelated types to void
pointers, improving readability and portability, and allows for more
efficient packing. Ultimately, the struct could be extended to include
the contents of "nameTab" and "spaceTab", further simplifying the code.

Historically, "pushTab" was only used by the push parser (hence the
name), so the change to the public headers should be safe.

Also remove an unused parameter from xmlParseEndTag2.
2021-05-08 22:16:49 +02:00
Nick Wellnhofer
de5b624f10 Fix handling of unexpected EOF in xmlParseContent
Readd the XML_ERR_TAG_NOT_FINISHED error on unexpected EOF which was
removed in commit 62150ed2.

This commit also introduced a regression for direct users of
xmlParseContent. Unclosed tags weren't checked.
2021-05-08 20:47:36 +02:00
Nick Wellnhofer
3e80560d4b Fix line numbers in error messages for mismatched tags
Commit 62150ed2 introduced a small regression in the error messages for
mismatched tags. This typically only affected messages after the first
mismatch, but with custom SAX handlers all line numbers would be off.

This also fixes line numbers in the SAX push parser which were never
handled correctly.
2021-05-07 11:48:11 +02:00
Nick Wellnhofer
babe75030c Propagate error in xmlParseElementChildrenContentDeclPriv
Check return value of recursive calls to
xmlParseElementChildrenContentDeclPriv and return immediately in case
of errors. Otherwise, struct xmlElementContent could contain unexpected
null pointers, leading to a null deref when post-validating documents
which aren't well-formed and parsed in recovery mode.

Fixes #243.
2021-05-01 17:24:49 +02:00
Nick Wellnhofer
c3fd8c4295 Fix exponential behavior with recursive entities
Fix another case where only recursion depth was limited, but entities
would still be expanded over and over again.

The test case discovered by fuzzing only affected parsing in recovery
mode with XML_PARSE_RECOVER.

Found by OSS-Fuzz.
2021-03-13 17:37:09 +01:00
Mike Dalessio
afad37216b parser.c: shrink the input buffer when appropriate
Fixes GNOME/libxml2#200

Also see discussions at:
- GNOME/libxml2#192
- https://gitlab.gnome.org/nwellnhof/libxml2/-/commit/99bda1e
- https://github.com/sparklemotion/nokogiri/issues/2132
2021-02-08 17:14:35 +01:00
Nick Wellnhofer
79301d3d5e Fix timeout when handling recursive entities
Abort parsing early to avoid an almost infinite loop in certain error
cases involving recursive entities.

Found with libFuzzer.
2020-12-18 14:13:46 +01:00
Nick Wellnhofer
45da175c14 Fix memory leak in xmlParseElementMixedContentDecl
Free parsed content if malloc fails to avoid a memory leak.

Found with libFuzzer.
2020-12-18 14:11:58 +01:00
Mike Dalessio
c0c26ff201 parser.c: xmlParseCharData peek behavior fixed wrt newlines
Previously, xmlParseCharData and xmlParseComment would consider 0xA to
be unhandleable when seen as the first byte of an input chunk, and
fall back to xmlParseCharDataComplex and xmlParseCommentComplex, which
have different memory and performance characteristics.

Fixes GNOME/libxml2#192
2020-10-25 20:00:59 +01:00
yanjinjq
7929f05710 Fix SEGV in xmlSAXParseFileWithData
Fixes #181.
2020-09-21 13:12:31 +02:00
Nick Wellnhofer
99fc048d7f Don't use SAX1 if all element handlers are NULL
Running xmllint with "--sax --noout" installs a SAX2 handler with all
callbacks set to NULL. In this case or similar situations, we don't want
to switch to SAX1 parsing.
2020-08-17 01:17:39 +02:00
Nick Wellnhofer
b82fa3dd26 Fix column number accounting in xmlParse*NameAndCompare
Thanks to Frederic Vancraeyveldt for the report.
2020-08-09 15:02:01 +02:00
Nick Wellnhofer
438e595a8c Stop counting nbChars in parser context
The value was inaccurate and never used.
2020-08-09 15:01:45 +02:00
Nick Wellnhofer
956534e02e Check for custom free function in global destructor
Calling a custom deallocation function in the global destructor could
cause all kinds of unexpected problems. See for example

    https://github.com/sparklemotion/nokogiri/issues/2059

Only clean up if memory is managed with malloc/free.
2020-08-04 19:27:13 +02:00
David Kilzer
0e5c4fec15 Reset XML parser input before reporting errors
Apply changes to htmlParseChunk() in 13ba5b61 and 3f18e748 to
xmlParseChunk().
2020-07-19 14:10:33 +02:00
Martin Vidner
43a8836cde Fix rebuilding docs, by hiding __attribute__((...)) behind a macro.
When enabled via `./configure --enable-rebuild-docs`,
`make -C doc libxml2-api.xml` will invoke apibuild.py
to rebuild libxml2-api.xml from the sources.
But the code added in
9fa3200cb3 made it error out with

```
Parsing ../parser.c
Parse Error: parsing type : expecting a name
('Got token ', ('sep', '('))
('Last token: ', ('sep', '('))
('Token queue: ', [('name', 'destructor'), ('sep', ')'), ('sep', ')')])
('Line 14689 end: ', '')
```
2020-06-24 19:55:52 +02:00
Nick Wellnhofer
a28f7d8789 Never expand parameter entities in text declaration
When parsing the text declaration of external DTDs or entities, make
sure that parameter entities are not expanded. This also fixes a memory
leak in certain error cases.

The change to xmlSkipBlankChars assumes that the parser state is
maintained correctly when parsing external DTDs or parameter entities,
and might expose bugs in the code that were hidden previously.

Found by OSS-Fuzz.
2020-06-10 14:25:19 +02:00
Nick Wellnhofer
2e8cc66d8f xmlParseBalancedChunkMemory must not be called with NULL doc
There is no way to avoid memory leaks without a document to hold the
namespace list.
2020-05-30 15:43:34 +02:00