Previously, xmlParseCharData and xmlParseComment would consider 0xA to
be unhandleable when seen as the first byte of an input chunk, and
fall back to xmlParseCharDataComplex and xmlParseCommentComplex, which
have different memory and performance characteristics.
FixesGNOME/libxml2#192
Running xmllint with "--sax --noout" installs a SAX2 handler with all
callbacks set to NULL. In this case or similar situations, we don't want
to switch to SAX1 parsing.
Calling a custom deallocation function in the global destructor could
cause all kinds of unexpected problems. See for example
https://github.com/sparklemotion/nokogiri/issues/2059
Only clean up if memory is managed with malloc/free.
When enabled via `./configure --enable-rebuild-docs`,
`make -C doc libxml2-api.xml` will invoke apibuild.py
to rebuild libxml2-api.xml from the sources.
But the code added in
9fa3200cb3 made it error out with
```
Parsing ../parser.c
Parse Error: parsing type : expecting a name
('Got token ', ('sep', '('))
('Last token: ', ('sep', '('))
('Token queue: ', [('name', 'destructor'), ('sep', ')'), ('sep', ')')])
('Line 14689 end: ', '')
```
When parsing the text declaration of external DTDs or entities, make
sure that parameter entities are not expanded. This also fixes a memory
leak in certain error cases.
The change to xmlSkipBlankChars assumes that the parser state is
maintained correctly when parsing external DTDs or parameter entities,
and might expose bugs in the code that were hidden previously.
Found by OSS-Fuzz.
Before, reader mode would end up in a branch that didn't handle
entities with multiple children and failed to update ent->last, so the
hack copying the "extra" reader data wouldn't trigger. Consequently,
some empty nodes in entities are correctly detected now in the test
suite. (The detection of empty nodes in entities is still buggy,
though.)
When a multiple modules (process/plugins) all link to libxml2.dll
they will in fact share a single loaded instance of it.
It is unsafe for any of them to call xmlCleanupParser,
as this would deinitialize the shared state and break others that might
still have ongoing use.
However, on windows atexit is per-module (rather process-wide), so if used
*within* libxml2 it is possible to register a clean up when all users
are done and libxml2.dll is about to actually unload.
This allows multiple plugins to link with and share libxml2 without
a premature cleanup if one is unloaded, while still cleaning up if *all*
such callers are themselves unloaded.
When ctxt->instate == XML_PARSER_EOF,xmlParseStringEntityRef
return NULL which cause a infinite loop in xmlStringLenDecodeEntities
Found with libFuzzer.
Signed-off-by: Zhipeng Xie <xiezhipeng1@huawei.com>
The file input callbacks try to read from stdin if "-" is passed as URL.
This should never be done when loading indirect resources like external
entities or XIncludes. Unfortunately, the stdin substitution happens
deep inside the IO code, so we simply replace "-" with "./-" in specific
locations.
This issue also affects other users of the library like libxslt.
Ideally, stdin should only be substituted on explicit request. But more
intrusive changes could break existing code.
Closes#90 and #102.
When doc is NULL, namespace created in xmlTreeEnsureXMLDecl
is bind to newDoc->oldNs, in this case, set newDoc->oldNs to
NULL and free newDoc will cause a memory leak.
Found with libFuzzer.
Closes#82.
Using unsigned long instead of ptrdiff_t results in non-zero
pointer deltas being stored as zero delta, giving incorrect offsets
into arrays and hence out of bounds reads.
This patch fixes the issue in all places in parser.c and adds a macro
to reduce the chances of cut-and-paste errors.
Only affects platforms where 'sizeof(long) < sizeof(size_t)' like
64-bit Windows.
See https://bugs.chromium.org/p/chromium/issues/detail?id=894933Closes#44.
If there's an error growing the input buffer when recovering from
invalid QNames, make sure to return NULL. Otherwise, callers could be
confused. In xmlParseStartTag2, for example, `tlen` could become
negative.
Found by OSS-Fuzz.
The patch fixes the parser not halting immediately when the error
handler attempts to stop the parser.
Rather it was running on and continuing to reference the freed buffer
in the while loop termination test.
This is only a problem if xmlStopParser is called from an error
handler. Probably caused by commit 123234f2. Fixes#58.
For https://bugzilla.gnome.org/show_bug.cgi?id=772726
This was used in xmlsec to detect issues with accessing external entities
and prevent them, but was unreliable, based on a patch from Aleksey Sanin
* parser.c: make sure input_id is incremented when creating sub-entities
for parsing or when parsing out of context
If libz or liblzma are detected with pkg-config, AC_CHECK_HEADERS must
not be run because the correct CPPFLAGS aren't set. It is actually not
required have separate checks for LIBXML_ZLIB_ENABLED and HAVE_ZLIB_H.
Only check for LIBXML_ZLIB_ENABLED and remove HAVE_ZLIB_H macro.
Fixes bug 764657, bug 787041.
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.
Fixes bug 784861.
Don't include windows.h and wsockcompat.h from config.h but only when
needed.
Don't define _WINSOCKAPI_ manually. This was apparently done to stop
windows.h from including winsock.h which is a problem if winsock2.h
wasn't included first. But on MinGW, this causes compiler warnings.
Define WIN32_LEAN_AND_MEAN instead which has the same effect.
Always use the compiler-defined _WIN32 macro instead of WIN32.
On 64-bit Windows, `long` is 32 bits wide and can't hold a pointer.
Switch to ptrdiff_t instead which should be the same size as a pointer
on every somewhat sane platform without requiring C99 types like
intptr_t.
Fixes bug 788312.
Thanks to J. Peter Mugaas for the report and initial patch.
In attribute content, don't emit entity references if there are
problems with the entity value. Otherwise some illegal entity values
like
<!ENTITY a '&#x123456789;'>
would later cause problems like integer overflow.
Make xmlStringLenDecodeEntities return NULL on more error conditions
including invalid char refs and errors from recursive calls. Remove
some fragile error checks based on lastError that shouldn't be
needed now. Clear the entity content in xmlParseAttValueComplex if
an error was found.
Found by OSS-Fuzz. Should fix bug 783052.
Also see https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3343
This reverts commit 79c8a6b which caused a serious regression in
streaming mode.
Also reverts part of commit 52ceced "Fix infinite loops with push
parser in recovery mode".
Fixes bug 786554.
First set of patches for zOS
- entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c:
ask conversion of code to ISO Latin 1 to avoid having the compiler assume
EBCDIC codepoint for characters.
- xmlmodule.c: make sure we have support for modules
- xmlIO.c: zOS path names are special avoid dsome of the expectstions from
Unix/Windows