The children member of entity reference nodes points to the entity
declaration and must never be followed when traversing a tree. In
the worst case, this could lead to an infinite loop.
It's somewhat unclear how moving entity references to other documents
should work exactly. For now we simply set the children pointer to NULL
to avoid a reference to the original document.
Fixes#42.
Client code should never add document nodes as children of other nodes,
but even our own XPointer code has a bug that can produce such trees.
Make sure to really free nested documents. Also see commits 0815302d
and 0762c9b6.
Should fix#269.
For historical reasons, the string API operates with int indices which
can overflow, especially on 64-bit systems. libxml2 always made the
tacit assumption that strings will be never larger than INT_MAX bytes.
It should be considered a bug if any part of the code can produce
larger strings, whether they are externally visible or not.
Likewise, API users are expected not to supply strings larger than
INT_MAX bytes. This requirement isn't documented. But even if it was,
we must handle larger strings passed in by accident without causing
memory errors.
- xmlStrndup, xmlCharStrndup, xmlUTF8Strndup
Avoid integer overflow if len == INT_MAX.
- xmlStrlen, xmlUTF8Strsize, xmlUTF8Strloc
Avoid integer overflow by using size_t for index. If an input string
larger than INT_MAX bytes is detected, these functions now return 0
instead of a wrong and possibly negative value.
- xmlCheckUTF8
Avoid integer overflow by limiting index range.
- xmlStrncat, xmlStrncatNew, xmlEscapeFormatString
Avoid integer overflow. Return NULL instead of producing strings
larger than INT_MAX bytes.
This enables the remaining checks from the "integer" group:
- implicit-unsigned-integer-truncation
- implicit-signed-integer-truncation
- implicit-integer-sign-change
These checks can find all kinds of bugs and only require explicit casts
if integer truncation or sign change is really intended.
Use unsigned long for temporary variable to avoid integer conversion
warnings with UBSan.
Note that this does change the computation of hash values for input
bytes larger than 0x7F. Before, these bytes were first converted to a
(typically) signed char with a negative value, then to a large unsigned
long near ULONG_MAX. I doubt that this was intentional. Input bytes
larger than 0x7F are now converted to unsigned long unchanged.
Also set ctxt->base when updating ctxt->cur. Always restore ctxt->cur
on error. Avoids integer truncation and wrong column numbers in
xmlXPathErr.
Stop hiding modification of ctxt members behind a macro.
Found with UBSan.
Remove the "tarname" added in commit 7c0253aa. Having a tarname
including a version number would result in tarballs named
libxml2-2.9.12-2.9.12.tar.gz.
This change also means that documentation will now be installed in
$(datadir)/doc/libxml2 instead of $(datadir)/doc/libxml2-$(version).
Having a version number in the documentation directory doesn't seem
helpful. The new location also matches the default autotools $(docdir).
xmlMemSetup must be called before initializing the parser, otherwise
some data structures will be allocated with system malloc instead of
our custom allocator. This throws off built-in memory debugging and
sanitizers.
This updates setup.py.in to pack the DLLs according to the options we specified
to configure.js or CMake (or, even configure, although autotools builds are not
likely to build the libxml2 Python module via distutils).
At this point, we can pack only the DLLs that libxml2 really depends on, and
pack the libxslt DLLs only if we really built the libxslt Python modules.
Also make the DLL filenames more easily configured
On Windows, we don't have fcntl() which helps us to find out how a file was
opened, so we need to resort to the Windows API NtQueryInformationFile() in
ntdll.dll to help us, and compare the file access modes as appropriate to
deduce the modes we want to pass into fdopen().
As all official Python 3.x releases are built against newer Windows CRTs that
toughen checks on the validity of the file descriptor when we convert the fd to
a native Windows File Handle using _get_osfhandle(), we need to define an empty
handler so that the program does not abort if the fd that was passed in was
invalid; instead, we just return NULL if _get_osfhandle() could not return us a
valid Windows File Handle.
Fix a bug in xmlCharEncOutput return value which will cause
xmlNodeDumpOutput to drop characters randomly.
xmlCharEncOutput returns zero if the length of the input buffer is
zero but ignores the fact that it may already encoded the input buffer
and the input's length is zero due to the fact that xmlEncOutputChunk
returned -2 errors and underlying code tries to fix the error by
encoding the input.
xmlCharEncOutput is collecting the number of bytes written to the
output buffer but is returning zero instead of the total number of
bytes in this situation. This commit will fix this issue by returning
the total number of bytes instead. So the xmlNodeDumpOutput will also
continue writing and will not stop due to the fact that it mistakenly
thinks the output buffer is not changed in that iteration.
Fixes#314
Previous to this commit, the examples where installed haphazardly within
all the other html documents, also overwriting index.html, for example.
Signed-off-by: Mattia Rizzolo <mattia@mapreri.org>
This is a completely noop change for this project, since before this
commit nothing was using $docdir nor PROGRAM_TARNAME.
Setting the fourth parameter of AC_INIT() makes it set PROGRAM_TARNAME,
which then used as the last path component of the default docdir,
effectively making $docdir be the same as the previous
$BASE_DIR/$DOC_MODULE.
Signed-off-by: Mattia Rizzolo <mattia@mapreri.org>
Nothing uses the results from these checks, so remove the checks. There
are some "uses" in order to suppress macro shadowing in MSVC's
implementation of `isinf` and `isnan` as macros, but those are
hard-coded and do not require checks to manage.