It's enough if xmlPopInput is called from xmlSkipBlankChars. Since the
replacement text of a parameter entity is surrounded with space
characters, that's the only place where the replacement can end in a
well-formed document.
This is also required to get rid of the "blanks wrapper" hack.
There are only two places where parameter entity references must be
handled. For the internal subset in xmlParseInternalSubset. For the
external subset or content from other external PEs in xmlSkipBlankChars.
Make sure that xmlSkipBlankChars skips over sequences of PEs and
whitespace. Rely on xmlSkipBlankChars instead of calling
xmlParsePEReference directly when in the external subset or a
conditional section.
xmlParserHandlePEReference is unused now.
Before, truncated UTF-8 sequences at the end of a file were treated as
EOF. Create an error message containing the offending bytes.
xmlStringCurrentChar would also print characters from the input stream,
not the string it's working on.
Call xmlBufResetInput before bailing out if switching the encoding
fails. Otherwise, the input pointers are left in an invalid state.
This would typically lead to an internal error in xmlGROW but could also
cause other unforeseen problems.
For https://bugzilla.gnome.org/show_bug.cgi?id=758606
* parserInternals.c:
(xmlNextChar): Add an test to catch other issues on ctxt->input
corruption proactively.
For non-UTF-8 charsets, xmlNextChar() failed to check for the end
of the input buffer and would continuing reading. Fix this by
pulling out the check for the end of the input buffer into common
code, and return if we reach the end of the input buffer
prematurely.
* result/HTML/758606.html: Added.
* result/HTML/758606.html.err: Added.
* result/HTML/758606.html.sax: Added.
* result/HTML/758606_2.html: Added.
* result/HTML/758606_2.html.err: Added.
* result/HTML/758606_2.html.sax: Added.
* test/HTML/758606.html: Added test case.
* test/HTML/758606_2.html: Added test case.
For https://bugzilla.gnome.org/show_bug.cgi?id=756527
and was also raised by Chromium team in the past
When we hit a convwersion failure when switching encoding
it is bestter to stop parsing there, this was treated as a
fatal error but the parser was continuing to process to extract
more errors, unfortunately that makes little sense as the data
is obviously corrupt and can potentially lead to unexpected behaviour.
If entities expansion in the XML parser is asked for,
it is possble to craft relatively small input document leading
to excessive on-the-fly content generation.
This patch accounts for those replacement and stop parsing
after a given threshold. it can be bypassed as usual with the
HUGE parser option.
https://bugzilla.gnome.org/show_bug.cgi?id=692915
the new set of converting functions tried to limit the encoding
conversion of the raw buffer to the consumption one to work in
a more progressive fashion. Unfortunately this was bad for
performances and led to errors on progressive parsing when
a very large chunk was close to the end of the document. Fix
the new internal function and switch back to the old way of
converting. Fix another bug in the process.
Those can be overrided by the XML_PARSE_HUGE option, they
are just default limits for Name lenght, dictionary size limits
and maximum amount of parser lookup.
* include/libxml/parserInternals.h: define the limits
* include/libxml/xmlerror.h: add a new error
* parser.c parserInternals.c: implements the new limits
This was scattered in a number of modules, xmlParserInputPtr
have usually their base, cur and end pointer set from an
xmlBuf used as input.
* buf.c buf.h: add a new function implementing this setup
* parser.c HTMLparser.c catalog.c parserInternals.c xmlreader.c
use the new function instead of digging into the buffer in
all those modules
The main changes are when the internal of the buffers structure
were adressed directly, we now use routines coming from buf.h
The routine xmlParserInputRead() which wasn't used anywhere is
deprecated too.
For https://bugzilla.gnome.org/show_bug.cgi?id=643148
Reported by Bill Clarke <llib@computer.org>, it used a global variable
as a counter for the input id and this was not thread safe. To avoid the
race without adding unneeded locking in the parser path, move the id to
the parser context instead.
* HTMLparser.c: new htmlParseElementInternal non recursive, with
htmlParseContentInternal and new function to handle node info
and element end.
* include/libxml/parser.h: add new stack for element info in parser
context
* parserInternals.c: fee element info stack
* encoding.c parser.c parserInternals.c: when we autodetect an encoding
but it's actually not completely compatible with the one declared
great care must be taken to not convert more than just the first line.
Led to some refactoring, more private functions and a bit of cleanup.
* parser.c: avoid a warning on 64bits introduced earlier
* parserInternals.c: make more checking on the UTF-8 input
Daniel
svn path=/trunk/; revision=3676
* HTMLparser.c configure.in parserInternals.c runsuite.c runtest.c
testapi.c xmlschemas.c xmlschemastypes.c xmlstring.c: fixed a number
of warnings shown by HP-UX compiler and reported by Rick Jones
Daniel
* encoding.c parserInternals.c: avoid passing a char[] as snprintf
first argument.
* threads.c include/libxml/threads.h: implemented xmlIsThreadsEnabled()
based on Andrew W. Nosenko idea.
* doc/* elfgcchack.h: regenerated the API
Daniel
* parserInternals.c: fix bug raised by zamez on IRC
* testapi.c: regenerated, seems to pop-up leaks in new tree functions
* tree.c: added comments missing.
* doc/*: regenerated
Daniel
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
format to cope with gcc4 change of aliasing allowed scopes, had
to add extra informations to doc/libxml2-api.xml to separate
the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
* parserInternals.c: fixed to skip (if necessary) the BOM for
encoding 'utf-16'. Completes the fix for bug #152286.
* tree.c, parser.c: minor warning cleanup, no change to logic
* gentest.py testapi.c: better handling of conditional features
* HTMLparser.c SAX2.c parserInternals.c xmlwriter.c: more testing
on parser contexts closed leaks, error messages
Daniel
* gentest.py testapi.c: autogenerate a minimal NULL value sequence
for unknown pointer types
* HTMLparser.c SAX2.c chvalid.c encoding.c entities.c parser.c
parserInternals.c relaxng.c valid.c xmlIO.c xmlreader.c
xmlsave.c xmlschemas.c xmlschemastypes.c xmlstring.c xpath.c
xpointer.c: This uncovered an impressive amount of entry points
not checking for NULL pointers when they ought to, closing all
the open gaps.
Daniel
* xmlmemory.c include/libxml/xmlmemory.h: adding xmlMemBlocks()
* Makefile.am gentest.py testapi.c: work on generator of an
automatic API regression test tool.
* SAX2.c nanoftp.c parser.c parserInternals.c tree.c xmlIO.c
xmlstring.c: various API hardeing changes as a result of running
teh first set of automatic API regression tests.
* test/slashdot16.xml: apparently missing from CVS, commited it
Daniel
* python/libxml.c: register xmlSchemaSetValidErrors, patch from
Brent Hendricks in the mailing-list
* include/libxml/valid.h HTMLparser.c SAX2.c valid.c
parserInternals.c: fix#156626 and more generally how to find out
if a validation contect is part of a parsing context or not. This
can probably be improved to make 100% sure that vctxt->userData
is the parser context too. It's a bit hairy because we can't
change the xmlValidCtxt structure without breaking the ABI since
this change xmlParserCtxt information indexes.
Daniel
* xmlIO.c: small typo pointed out by Mike Hommey
* doc/xmllint.xml, xmllint.html, xmllint.1: slightly improved
the --c14n description, c.f. #144675 .
* nanohttp.c nanoftp.c: applied a first simple patch from
Mike Hommey for $no_proxy, c.f. #133470
* parserInternals.c include/libxml/parserInternals.h
include/libxml/xmlerror.h: cleanup to avoid 'error' identifier
in includes #
* parser.c SAX2.c debugXML.c include/libxml/parser.h:
first version of the inplementation of parsing within
the context of a node in the tree #142359, new function
xmlParseInNodeContext(), added support at the xmllint --shell
level as the "set" function
* test/scripts/set* result/scripts/* Makefile.am: extended
the script based regression tests to instrument the new function.
Daniel
* parserInternals.c xmlIO.c encoding.c include/libxml/parser.h
include/libxml/xmlIO.h: added xmlByteConsumed() interface
* doc/*: updated the benchmark rebuilt the docs
* python/tests/Makefile.am python/tests/indexes.py: added a
specific regression test for xmlByteConsumed()
* include/libxml/encoding.h rngparser.c tree.c: small cleanups
Daniel
* xinclude.c: remove the warning on the 2001 namespace
* parser.c parserInternals.c xpath.c: remove some warnings
when compiling with MSVC6
* nanohttp.c: applied a patch when using _WINSOCKAPI_
Daniel