1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-26 00:37:43 +03:00
Commit Graph

151 Commits

Author SHA1 Message Date
Nick Wellnhofer
453dff1e3b Remove unnecessary calls to xmlPopInput
It's enough if xmlPopInput is called from xmlSkipBlankChars. Since the
replacement text of a parameter entity is surrounded with space
characters, that's the only place where the replacement can end in a
well-formed document.

This is also required to get rid of the "blanks wrapper" hack.
2017-06-20 13:19:47 +02:00
Nick Wellnhofer
aa267cd127 Simplify handling of parameter entity references
There are only two places where parameter entity references must be
handled. For the internal subset in xmlParseInternalSubset. For the
external subset or content from other external PEs in xmlSkipBlankChars.

Make sure that xmlSkipBlankChars skips over sequences of PEs and
whitespace. Rely on xmlSkipBlankChars instead of calling
xmlParsePEReference directly when in the external subset or a
conditional section.

xmlParserHandlePEReference is unused now.
2017-06-20 13:19:47 +02:00
Nick Wellnhofer
46dc989080 Don't switch encoding for internal parameter entities
This is only needed for external entities. Trying to switch the encoding
for internal entities could also cause a memory leak in recovery mode.
2017-06-17 13:23:40 +02:00
Nick Wellnhofer
79c8a6b105 Print error messages for truncated UTF-8 sequences
Before, truncated UTF-8 sequences at the end of a file were treated as
EOF. Create an error message containing the offending bytes.

xmlStringCurrentChar would also print characters from the input stream,
not the string it's working on.
2017-06-10 18:11:58 +02:00
Nick Wellnhofer
f9e7997e80 Reset parser input pointers on encoding failure
Call xmlBufResetInput before bailing out if switching the encoding
fails. Otherwise, the input pointers are left in an invalid state.
This would typically lead to an internal error in xmlGROW but could also
cause other unforeseen problems.
2017-06-10 17:50:27 +02:00
Nick Wellnhofer
45ce1ee399 Add TODO comment in xmlSwitchEncoding
It would be nice if we could recover from unsupported encodings in
external entities.
2017-06-10 17:32:44 +02:00
Nick Wellnhofer
0db8dc9ddc Stop parser on unsupported encodings
Otherwise, the push parser can loop infinitely in recover mode.

Found with libFuzzer.
2017-06-07 19:30:56 +02:00
Pranjal Jumde
0bcd05c5cd Heap-based buffer overread in htmlCurrentChar
For https://bugzilla.gnome.org/show_bug.cgi?id=758606

* parserInternals.c:
(xmlNextChar): Add an test to catch other issues on ctxt->input
corruption proactively.
For non-UTF-8 charsets, xmlNextChar() failed to check for the end
of the input buffer and would continuing reading.  Fix this by
pulling out the check for the end of the input buffer into common
code, and return if we reach the end of the input buffer
prematurely.
* result/HTML/758606.html: Added.
* result/HTML/758606.html.err: Added.
* result/HTML/758606.html.sax: Added.
* result/HTML/758606_2.html: Added.
* result/HTML/758606_2.html.err: Added.
* result/HTML/758606_2.html.sax: Added.
* test/HTML/758606.html: Added test case.
* test/HTML/758606_2.html: Added test case.
2016-05-23 15:01:07 +08:00
David Kilzer
4472c3a5a5 Fix some format string warnings with possible format string vulnerability
For https://bugzilla.gnome.org/show_bug.cgi?id=761029

Decorate every method in libxml2 with the appropriate
LIBXML_ATTR_FORMAT(fmt,args) macro and add some cleanups
following the reports.
2016-05-23 15:01:07 +08:00
David Kilzer
d433ea6c83 Integer signed/unsigned type mismatch in xmlParserInputGrow()
For https://bugzilla.gnome.org/show_bug.cgi?id=766635

* parserInternals.c:
(xmlParserInputGrow): Change 'ret' type to 'int' to match the
return type of xmlParserInputBufferGrow().
2016-05-22 09:49:50 +08:00
Daniel Veillard
fdfeecc1b7 Bug on creating new stream from entity
sometimes the entity could have a lenght of 0, i.e. it wasn't
parsed or used yet, and we ended up with an incoherent input state
2015-11-20 15:07:38 +08:00
Daniel Veillard
afd27c21f6 Avoid processing entities after encoding conversion failures
For https://bugzilla.gnome.org/show_bug.cgi?id=756527
and was also raised by Chromium team in the past

When we hit a convwersion failure when switching encoding
it is bestter to stop parsing there, this was treated as a
fatal error but the parser was continuing to process to extract
more errors, unfortunately that makes little sense as the data
is obviously corrupt and can potentially lead to unexpected behaviour.
2015-11-09 18:07:18 +08:00
Daniel Veillard
c35af8b18d Fixes for xmlInitParserCtxt
let's make sure that parser options are updated too when a corrsponding
global variable or other field of the context is set.
2014-06-11 17:00:39 +08:00
Daniel Veillard
ff76eb28c7 Clear up a potential NULL dereference
https://bugzilla.gnome.org/show_bug.cgi?id=705399

if ctxt->node_seq.buffer is null then ctxt->node_seq.maximum ought
to be zero but it's better to clarify the check in the code directly.
2013-08-03 22:25:13 +08:00
Daniel Veillard
23f05e0c33 Detect excessive entities expansion upon replacement
If entities expansion in the XML parser is asked for,
it is possble to craft relatively small input document leading
to excessive on-the-fly content generation.
This patch accounts for those replacement and stop parsing
after a given threshold. it can be bypassed as usual with the
HUGE parser option.
2013-02-19 10:21:49 +08:00
Daniel Veillard
bf058dce13 Fix the flushing out of raw buffers on encoding conversions
https://bugzilla.gnome.org/show_bug.cgi?id=692915

the new set of converting functions tried to limit the encoding
conversion of the raw buffer to the consumption one to work in
a more progressive fashion. Unfortunately this was bad for
performances and led to errors on progressive parsing when
a very large chunk was close to the end of the document. Fix
the new internal function and switch back to the old way of
converting. Fix another bug in the process.
2013-02-13 18:19:42 +08:00
Michael Wood
fb27e2cd20 Fix spelling of "length". 2012-10-30 10:18:49 +08:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
52d8ade7a7 Introduce some default parser limits
Those can be overrided by the XML_PARSE_HUGE option, they
are just default limits for Name lenght, dictionary size limits
and maximum amount of parser lookup.
* include/libxml/parserInternals.h: define the limits
* include/libxml/xmlerror.h: add a new error
* parser.c parserInternals.c: implements the new limits
2012-07-30 10:08:45 +08:00
Daniel Veillard
61551a1eb7 Cleanup function xmlBufResetInput() to set input from Buffer
This was scattered in a number of modules, xmlParserInputPtr
have usually their base, cur and end pointer set from an
xmlBuf used as input.
* buf.c buf.h: add a new function implementing this setup
* parser.c HTMLparser.c catalog.c parserInternals.c xmlreader.c
  use the new function instead of digging into the buffer in
  all those modules
2012-07-23 14:24:27 +08:00
Daniel Veillard
768eb3b82d Convert XML parser to the new input buffers
The main changes are when the internal of the buffers structure
were adressed directly, we now use routines coming from buf.h
The routine xmlParserInputRead() which wasn't used anywhere is
deprecated too.
2012-07-23 14:24:26 +08:00
Daniel Veillard
0d51cfebc9 Fix a race in xmlNewInputStream
For https://bugzilla.gnome.org/show_bug.cgi?id=643148
Reported by Bill Clarke <llib@computer.org>, it used a global variable
as a counter for the input id and this was not thread safe. To avoid the
race without adding unneeded locking in the parser path, move the id to
the parser context instead.
2012-05-15 11:18:40 +08:00
Eugene Pimenov
615904f582 Switch the HTML parser to be non-recursive
* HTMLparser.c: new htmlParseElementInternal non recursive, with
  htmlParseContentInternal and new function to handle node info
  and element end.
* include/libxml/parser.h: add new stack for element info in parser
  context
* parserInternals.c: fee element info stack
2010-03-15 15:16:02 +01:00
Daniel Veillard
7e385bd4e2 566012 autodetected encoding and encoding conflict
* encoding.c parser.c parserInternals.c: when we autodetect an encoding
  but it's actually not completely compatible with the one declared
  great care must be taken to not convert more than just the first line.
  Led to some refactoring, more private functions and a bit of cleanup.
2009-08-26 11:38:49 +02:00
Daniel Veillard
33c76c8312 Fix end of buffer char being split in XML parser
* parserInternals.c: similar patch to previous, reset cur on GROW
  in xmlNextChar and xmlCurrentChar
2009-08-25 11:30:34 +02:00
Nick Wellnhofer
2f522dc68f Fix xmlKeepBlanksDefault to not break indent
* parserInternals.c: the old compatibility function xmlKeepBlanksDefault()
  should not reset xmlIndentTreeOutput to 0 because the default is 1
2009-08-20 12:11:17 +02:00
Daniel Veillard
4bf899bf1b fix for CVE-2008-3281 Daniel
* include/libxml/parser.h include/libxml/entities.h entities.c
  parserInternals.c parser.c: fix for CVE-2008-3281
Daniel

svn path=/trunk/; revision=3772
2008-08-20 17:04:30 +00:00
Daniel Veillard
87303e3c7c applied patch from Ashwin to avoid a potential double-free Daniel
* parserInternals.c: applied patch from Ashwin to avoid a potential
  double-free
Daniel

svn path=/trunk/; revision=3741
2008-04-28 18:07:29 +00:00
Daniel Veillard
b3edafd72d avoid a warning on 64bits introduced earlier make more checking on the
* parser.c: avoid a warning on 64bits introduced earlier
* parserInternals.c: make more checking on the UTF-8 input
Daniel

svn path=/trunk/; revision=3676
2008-01-11 08:00:57 +00:00
Daniel Veillard
5addfebd06 applied patch from Marius Konitzer to avoid leaking in
* parserInternals.c: applied patch from Marius Konitzer to avoid
  leaking in xmlNewInputFromFile() in case of HTTP redirection
Daniel
2006-10-17 20:32:22 +00:00
Daniel Veillard
30e7607b7a a bunch of small cleanups based on coverity reports. Daniel
* HTMLparser.c parser.c parserInternals.c pattern.c uri.c: a bunch
  of small cleanups based on coverity reports.
Daniel
2006-03-09 14:13:55 +00:00
Daniel Veillard
6a0baa0cd8 fixed a number of warnings shown by HP-UX compiler and reported by Rick
* HTMLparser.c configure.in parserInternals.c runsuite.c runtest.c
  testapi.c xmlschemas.c xmlschemastypes.c xmlstring.c: fixed a number
  of warnings shown by HP-UX compiler and reported by Rick Jones
Daniel
2005-12-10 11:11:12 +00:00
Daniel Veillard
c19d535e5e removed unreachable code pointed out by Oleksandr Kononenko, fixes bug
* parserInternals.c: removed unreachable code pointed out by
  Oleksandr Kononenko, fixes bug #321695
Daniel
2005-11-17 13:12:16 +00:00
Daniel Veillard
6e84bb28dd fix a problem in some error case on Solaris when passed a NULL filename,
* parserInternals.c: fix a problem in some error case on Solaris
  when passed a NULL filename, pointed by Albert Chin.
Daniel
2005-10-26 09:00:29 +00:00
Daniel Veillard
2e7598cb06 avoid passing a char[] as snprintf first argument. implemented
* encoding.c parserInternals.c: avoid passing a char[] as snprintf
  first argument.
* threads.c include/libxml/threads.h: implemented xmlIsThreadsEnabled()
  based on Andrew W. Nosenko idea.
* doc/* elfgcchack.h: regenerated the API
Daniel
2005-09-02 12:28:34 +00:00
Daniel Veillard
75e389d4e0 more cleanups based on sparse reports, added "make sparse" Daniel
* Makefile.am globals.c parserInternals.c xmlreader.c xmlunicode.c
  xmlwriter.c: more cleanups based on sparse reports, added
  "make sparse"
Daniel
2005-07-29 22:02:24 +00:00
Daniel Veillard
304e78c6b4 fix bug raised by zamez on IRC regenerated, seems to pop-up leaks in new
* parserInternals.c: fix bug raised by zamez on IRC
* testapi.c: regenerated, seems to pop-up leaks in new tree functions
* tree.c: added comments missing.
* doc/*: regenerated
Daniel
2005-07-03 16:19:41 +00:00
Daniel Veillard
5d4644ef6e revamped the elfgcchack.h format to cope with gcc4 change of aliasing
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
  format to cope with gcc4 change of aliasing allowed scopes, had
  to add extra informations to doc/libxml2-api.xml to separate
  the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
  and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
2005-04-01 13:11:58 +00:00
Aleksey Sanin
8fdc32abfe fixing col information in xmlParserInput and propagating column into xmlError 2005-01-05 15:37:55 +00:00
William M. Brack
1d8c9b291e fixed to skip (if necessary) the BOM for encoding 'utf-16'. Completes the
* parserInternals.c: fixed to skip (if necessary) the BOM for
  encoding 'utf-16'.  Completes the fix for bug #152286.
* tree.c, parser.c: minor warning cleanup, no change to logic
2004-12-25 10:14:57 +00:00
Daniel Veillard
a521d28751 better handling of conditional features more testing on parser contexts
* gentest.py testapi.c: better handling of conditional features
* HTMLparser.c SAX2.c parserInternals.c xmlwriter.c: more testing
  on parser contexts closed leaks, error messages
Daniel
2004-11-09 14:59:59 +00:00
Daniel Veillard
f2a36f98e1 more types. more fixes Daniel
* testapi.c: more types.
* parserInternals.c xpath.c: more fixes
Daniel
2004-11-08 17:55:01 +00:00
Daniel Veillard
2a4fb5ac07 more coverage more fixes Daniel
* gentest.py testapi.c: more coverage
* SAX2.c parser.c parserInternals.c: more fixes
Daniel
2004-11-08 14:02:18 +00:00
Daniel Veillard
4259532303 more types, more coverage more problems fixed Daniel
* gentest.py testapi.c: more types, more coverage
* parser.c parserInternals.c relaxng.c valid.c xmlIO.c
  xmlschemastypes.c: more problems fixed
Daniel
2004-11-08 10:52:06 +00:00
Daniel Veillard
ce682bc24b autogenerate a minimal NULL value sequence for unknown pointer types This
* gentest.py testapi.c: autogenerate a minimal NULL value sequence
  for unknown pointer types
* HTMLparser.c SAX2.c chvalid.c encoding.c entities.c parser.c
  parserInternals.c relaxng.c valid.c xmlIO.c xmlreader.c
  xmlsave.c xmlschemas.c xmlschemastypes.c xmlstring.c xpath.c
  xpointer.c: This uncovered an impressive amount of entry points
  not checking for NULL pointers when they ought to, closing all
  the open gaps.
Daniel
2004-11-05 17:22:25 +00:00
Daniel Veillard
36e5cd5064 adding xmlMemBlocks() work on generator of an automatic API regression
* xmlmemory.c include/libxml/xmlmemory.h: adding xmlMemBlocks()
* Makefile.am gentest.py testapi.c: work on generator of an
  automatic API regression test tool.
* SAX2.c nanoftp.c parser.c parserInternals.c tree.c xmlIO.c
  xmlstring.c: various API hardeing changes as a result of running
  teh first set of automatic API regression tests.
* test/slashdot16.xml: apparently missing from CVS, commited it
Daniel
2004-11-02 14:52:23 +00:00
Daniel Veillard
eff45a92da register xmlSchemaSetValidErrors, patch from Brent Hendricks in the
* python/libxml.c: register xmlSchemaSetValidErrors, patch from
  Brent Hendricks in the mailing-list
* include/libxml/valid.h HTMLparser.c SAX2.c valid.c
  parserInternals.c: fix #156626 and more generally how to find out
  if a validation contect is part of a parsing context or not. This
  can probably be improved to make 100% sure that vctxt->userData
  is the parser context too. It's a bit hairy because we can't
  change the xmlValidCtxt structure without breaking the ABI since
  this change xmlParserCtxt information indexes.
Daniel
2004-10-29 12:10:55 +00:00
Daniel Veillard
29b1748205 small typo pointed out by Mike Hommey slightly improved the --c14n
* xmlIO.c: small typo pointed out by Mike Hommey
* doc/xmllint.xml, xmllint.html, xmllint.1: slightly improved
  the --c14n description, c.f. #144675 .
* nanohttp.c nanoftp.c: applied a first simple patch from
  Mike Hommey for $no_proxy, c.f. #133470
* parserInternals.c include/libxml/parserInternals.h
  include/libxml/xmlerror.h: cleanup to avoid 'error' identifier
  in includes #
* parser.c SAX2.c debugXML.c include/libxml/parser.h:
  first version of the inplementation of parsing within
  the context of a node in the tree #142359, new function
  xmlParseInNodeContext(), added support at the xmllint --shell
  level as the "set" function
* test/scripts/set* result/scripts/* Makefile.am: extended
  the script based regression tests to instrument the new function.
Daniel
2004-08-16 00:39:03 +00:00
Daniel Veillard
3671190b54 added xmlByteConsumed() interface updated the benchmark rebuilt the docs
* parserInternals.c xmlIO.c encoding.c include/libxml/parser.h
  include/libxml/xmlIO.h: added xmlByteConsumed() interface
* doc/*: updated the benchmark rebuilt the docs
* python/tests/Makefile.am python/tests/indexes.py: added a
  specific regression test for xmlByteConsumed()
* include/libxml/encoding.h rngparser.c tree.c: small cleanups
Daniel
2004-02-11 13:25:26 +00:00
Daniel Veillard
5bb9ccd56a remove the warning on the 2001 namespace remove some warnings when
* xinclude.c: remove the warning on the 2001 namespace
* parser.c parserInternals.c xpath.c: remove some warnings
  when compiling with MSVC6
* nanohttp.c: applied a patch when using _WINSOCKAPI_
Daniel
2004-02-09 12:39:02 +00:00