Make xmlOpenCharEncodingHandler call xmlParseCharEncoding first so we
prefer our own handlers for names like "UTF8". Only UTF-16 needs an
exception.
Make callers check the return value. For UTF-8, a NULL encoding doesn't
mean an error.
Remove unnecessary UTF-8 check from htmlFindOutputEncoder. Don't try to
look up ASCII handler since the HTML handler is always available.
Fix return code of xmlParseCharEncoding.
Should fix#744.
Handle malloc failrue from xmlRaiseError.
Use xmlRaiseMemoryError.
Stop using xmlGenericError.
Remove argument from memory error handler.
Remove TODO macro.
Remove explicit integer casts as final operation
- in assignments
- when passing arguments
- when returning values
Remove casts
- to the same type
- from certain range-bound values
The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.
Document some explicit casts as truncating and add a few missing ones.
Private functions were previously declared
- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.
Consolidate all private header files in include/private.
Patch by J Pascoe of Apple.
* HTMLtree.c:
(htmlDocContentDumpFormatOutput):
- Prior to commit b79ab6e6d9, xmlDoc.type was set to
XML_HTML_DOCUMENT_NODE before dumping the HTML output, then
restored before returning.
Similar to 8f5710379, mark more static data structures with
`const` keyword.
Also fix placement of `const` in encoding.c.
Original patch by Sarah Wilkin.
The old, non-recursive HTML serialization code would always terminate
the output with a newline. The new implementation omitted the newline
if the document node had no children. Readd the newline when
serializing empty documents.
Fixes#266.
Make xmlNodeDumpOutput and htmlNodeDumpFormatOutput work with corrupted
parent pointers. This used to work with the old recursive code but the
non-recursive rewrite required parent pointers to be set correctly.
Unfortunately, lxml relies on the old behavior and passes subtrees with
a corrupted structure. Fall back to a recursive function call if an
invalid parent pointer is detected.
Fixes#255.
Check parent pointers for NULL after the non-recursive rewrite of the
serialization code. This avoids segfaults with corrupted documents
which can apparently be seen with lxml, see issue #187.
This reverts commit 960f0e2756.
This commit introduced
- an infinite loop, found by OSS-Fuzz, which could be easily fixed.
- an algorithm with quadratic runtime
- a security issue, see
https://bugzilla.gnome.org/show_bug.cgi?id=769760
A better approach is to add an option not to escape URLs at all
which libxml2 should have possibly done in the first place.
For https://bugzilla.gnome.org/show_bug.cgi?id=747301
Use simple HTML5 DOCTYPE for about:legacy-compat
HTML5 uses a DOCTYPE without a PUBLIC or SYSTEM identifier. It looks
like this:
<!DOCTYPE html>
I can't use XSLT to output this, because to get a DOCTYPE I have to
provide a PUBLIC or SYSTEM identifier. Luckily, the standards folks
recognized this and provided this semantically equivalent form for the
HTML DOCTYPE:
<!DOCTYPE html SYSTEM "about:legacy-compat">
But people don't like seeing the "legacy" identifier in their output.
They'd rather see the shiny new DOCTYPE. Since we know that
about:legacy-compat is defined by the W3C to be semantically equivalent
to the sans-SYSTEM DOCTYPE, we could just special-case it in the HTML
serializer in libxml2. So if you set the SYSTEM identifier to
"about:legacy-compat", you get an HTML5 short-form DOCTYPE.
Handle special cases of &{...} constructs as hinted in the spec
http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
For https://bugzilla.gnome.org/show_bug.cgi?id=630682
The python tests were reporting errors, some of it was due to
a small change in case encoding, but the main one was about
htmlSetMetaEncoding(doc, NULL) being broken by not removing
the associated meta tag anymore
For both XML and HTML, the document can provide an encoding
either in XMLDecl in XML, or as a meta element in HTML head.
This adds options to ignore those encodings if the encoding
is known in advace for example if the content had been converted
before being passed to the parser.
* parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option
for XML parsing
* include/libxml/HTMLparser.h HTMLparser.c: adds the
HTML_PARSE_IGNORE_ENC for HTML parsing
* HTMLtree.c: fix the handling of saving when an unknown encoding is
defined in meta document header
* xmllint.c: add a --noenc option to activate the new parser options
* HTMLtree.c: htmlSetMetaEncoding should not destroy existing meta
encoding elements, plus it should not change things at all if the
encoding is the same. Also fixed htmlSaveFileFormat() to ask for
change if outputing to UTF-8.
* trionan.c: Borland C fix from Moritz Both
* testapi.c: regenerate, workaround a problem for buffer testing
* xmlIO.c HTMLtree.c: new internal entry point to hide even better
xmlAllocOutputBufferInternal
* tree.c: harden the code around buffer allocation schemes
* parser.c: restore the warning when namespace names are not absolute
URIs
* runxmlconf.c: continue regression tests if we get the expected
number of errors
* Makefile.am: run the python tests on make check
* xmlsave.c: handle the HTML documents and trees
* python/libxml.c: convert python serialization to the xmlSave APIs
and avoid some horrible hacks
Daniel
svn path=/trunk/; revision=3790
* tree.c: fix bug #322136 in xmlNodeBufGetContent when entity ref is
a child of an element (fix by Oleksandr Kononenko).
* HTMLtree.c include/libxml/HTMLtree.h: Add htmlDocDumpMemoryFormat.
* HTMLtree.c: fixed bug #310333 with a patch close to the provided
patch for HTML UTF-8 serialization
* result/HTML/script2.html: this changed the output of that test
Daniel
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
format to cope with gcc4 change of aliasing allowed scopes, had
to add extra informations to doc/libxml2-api.xml to separate
the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
* gentest.py testapi.c: augmented types supported
* HTMLtree.c tree.c xmlreader.c xmlwriter.c: a number of new
bug fixes and documentation updates.
Daniel
* gentest.py testapi.c: fixed the way the generator works,
extended the testing, especially with more real trees and nodes.
* HTMLtree.c tree.c valid.c xinclude.c xmlIO.c xmlsave.c: a bunch
of real problems found and fixed.
* entities.c: fix error reporting to go through the new handlers
Daniel
* xmlmemory.c include/libxml/xmlmemory.h: adding xmlMemBlocks()
* Makefile.am gentest.py testapi.c: work on generator of an
automatic API regression test tool.
* SAX2.c nanoftp.c parser.c parserInternals.c tree.c xmlIO.c
xmlstring.c: various API hardeing changes as a result of running
teh first set of automatic API regression tests.
* test/slashdot16.xml: apparently missing from CVS, commited it
Daniel
* nanohttp.c, include/libxml/nanohttp.h: added the routine
xmlNanoHTTPContentLength to the external API (bug151968).
* parser.c: fixed unnecessary internal error message (bug152060);
also changed call to strncmp over to xmlStrncmp.
* encoding.c: fixed compilation warning (bug152307).
* tree.c: fixed segfault in xmlCopyPropList (bug152368); fixed
a couple of compilation warnings.
* HTMLtree.c, debugXML.c, xmlmemory.c: fixed a few compilation
warnings; no change to logic.