1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-26 00:37:43 +03:00
Commit Graph

419 Commits

Author SHA1 Message Date
Daniel Veillard
7d4c529a33 Improve HTML escaping of attribute on output
Handle special cases of &{...} constructs as hinted in the spec
  http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
2012-09-05 12:11:43 +08:00
Daniel Veillard
968a03a2e5 Add support for big line numbers in error reporting
Fix the lack of line number as reported by Johan Corveleyn <jcorvel@gmail.com>

* parser.c include/libxml/parser.h: add an XML_PARSE_BIG_LINES parser
  option not switch on by default, it's an opt-in
* SAX2.c: if XML_PARSE_BIG_LINES is set store the long line numbers
  in the psvi field of text nodes
* tree.c: expand xmlGetLineNo to extract those informations, also
  make sure we can't fail on recursive behaviour
* error.c: in __xmlRaiseError, if a node is provided, call
  xmlGetLineNo() if we can't get a valid line number.
* xmllint.c: switch on XML_PARSE_BIG_LINES in xmllint
2012-08-13 12:41:33 +08:00
Daniel Veillard
28cc42d068 Regenerating docs and API files
Various cleanups
* configure.in: force regeneration of APIs in my environment
* buf.c buf.h enc.h encoding.c include/libxml/tree.h
  include/libxml/xmlerror.h save.h tree.c: various comment cleanups
  pointed by apibuild
* doc/apibuild.py: added the 3 new internal headers in the excludes
* doc/libxml2-api.xml doc/libxml2-refs.xml: regenerated the API
* doc/symbols.xml: listing new entry points for 2.9.0
* doc/devhelp/*: regenerated
2012-08-10 10:00:18 +08:00
Daniel Veillard
3e62adbe39 Adding various checks on node type though the API
Specifially checking against namespace nodes before accessing node
pointers
2012-08-09 14:24:02 +08:00
Daniel Veillard
6ca24a39d0 Namespace nodes can't be unlinked with xmlUnlinkNode 2012-08-08 15:31:55 +08:00
Daniel Veillard
c15df7d4ee Avoid using xmlBuffer for serialization
Mostly an optimization to avoid xmlBuffer->xmlBuf conversions
and use the new code.
2012-08-07 15:15:04 +08:00
Daniel Veillard
dddeede060 Provide new xmlBuf based saving functions
* include/libxml/tree.h: adds xmlBufGetNodeContent and xmlBufNodeDump
  as xmlBuf based equivalents of xmlNodeGetContent and xmlNodeDump
* tree.c: implements one new routine and converts xmlNodeBufGetContent
  to use the xmlBuf equivalent. It should behave better as a result
  in case of data larger than 2GB.
2012-07-23 14:24:27 +08:00
Daniel Veillard
94431ecba6 Fix various bugs in new code raised by the API checking
* testapi.c: regenerated and covering new APIs
* tree.c: xmlBufferDetach can't work on immutable buffers
* xzlib.c: fix a deallocation error
2012-05-15 10:45:05 +08:00
Daniel Veillard
79ee284abb Fix various problems with "make dist"
* tree.c: missing documentation for xmlBufferDetach
* doc/symbols.xml: add two new symbols xmlTextReaderRelaxNGValidateCtxt
                   and xmlBufferDetach
* doc/apibuild.py: ignore internal header xzlib.h
2012-05-15 10:25:31 +08:00
Conrad Irwin
7d0d2a50ac Use a hybrid allocation scheme in xmlNodeSetContent
On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard@redhat.com> wrote:
>  Hi Conrad,
>
> that's interesting ! I was initially afraid of a sudden explosion of
> memory allocations for building a tree since by default buffers tend to
> "waste" memory by using doubling allocations, but that's not the case.
>  xmllint --noout doc/libxml2-api.xml
> when compiled with memory debug produce
>
> paphio:~/XML -> cat .memdump
>      MEMORY ALLOCATED : 0, MAX was 12756699
>
> and without your patch 12755657, i.e. the increase is minimal.

Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This
is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the
same amount of space as XML_ALLOC_DOUBLEIT on this example (see below).

So it turns out that the default realloc() on my system actually handles this case really
well — and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the
underlying realloc() after all (sorry for misleading you). If you replace the realloc()
with a bad one (like valgrind's), then the performance degrades severely.

This patch implements a HYBRID allocator which has the behaviour you describe (it's
like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT
after 4kb) — that gets the memory back down to 12755657, with no noticeable impact on the
performance of the synthetic pathological example under valgrind.

In summary:

     max_memory on ./xmllint --noout doc/libxml2-api.xml,
     valgrind time on https://gist.github.com/2656940

            max_memory    valgrind time
before   |  12755657    | 29:18.2
EXACT    |  12756699    |  2:58.6 <-- this is the state after the first patch.
DOUBLEIT |  12756727    |  0:02.7
HYBRID   |  12755754    |  0:02.7 <-- this is the state with both patches.

>
> There is also the cost of creating the buffers all the time.
> I need to read the code and check but I may be interested in an hybrid
> approach where we switch to buffer only when the text node starts to
> become too big (4k would remove nearly all usuall types of "document"
> usage, i.e. not blocks of data)

I tried to avoid too much buffer creation by introducing the xmlBufferDetach function,
which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack"
in API terms though I thought the gains would be worth it.

Conrad

------8<------

To keep memory usage tight in normal conditions it's desirable to only
allocate as much space as is needed. Unfortunately this can lead to
problems when constructing a long string out of small chunks, because
every chunk you add will need to resize the buffer.

To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big)
from using exact allocations to doubling buffer size every time it is
full. This limits the number of buffer resizes to O(log n) (down from
O(n)), and thus greatly increases the performance of constructing very
large strings in this manner.
2012-05-14 14:18:58 +08:00
Conrad Irwin
7d553f834e Use buffers when constructing string node lists.
Hi Veillard and all,

Firstly, thanks for libxml: it's awesome!

I noticed recently that libxml was taking a surprisingly long time to perform some
operations (many minutes instead of milliseconds), and so I did some digging. It turns out
that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can
be called many (many) times when assigning some content into a node.

For background, I'm dealing with XML that contains emails, these can have large
attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends
with &#13;. This means that xmlNodeAddContentLen() is being called about 200,000 times,
and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic
example of this at https://gist.github.com/2656940)

The attached patch works around that problem by using the existing buffer API to merge the
strings together before even creating the text node, this keeps the number of realloc()s
at a managable level.

I'd love feedback on the patch, and am happy to fix problems with it, or explore other
solutions if you think that this is barking up the wrong tree :).

Thanks,

Conrad

P.S. Should I create a bug for this too?

------8<------

Before this change xmlStringGetNodeList would perform a realloc() of the
entire new content for every XML entity in the assigned text in order to
merge together adjacent text nodes. This had the effect of making
xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on
inputs that contained a large number of XML entities.

After this change the memory management is done by the buffer API,
avoiding the need to continually re-measure and realloc() the string.

For my test data (6MB of 80 character lines, each ending with &#13;)
this takes the time to xmlSetNodeContent from about 500 seconds to
around 50ms. I have not profiled smaller cases, though I tried to
minimize the performance impact of my change by avoiding unnecessary
string copying.

Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>
2012-05-14 13:51:30 +08:00
Daniel Veillard
39d027cdb7 Fix html serialization error and htmlSetMetaEncoding()
For https://bugzilla.gnome.org/show_bug.cgi?id=630682
The python tests were reporting errors, some of it was due to
a small change in case encoding, but the main one was about
htmlSetMetaEncoding(doc, NULL) being broken by not removing
the associated meta tag anymore
2012-05-11 12:38:23 +08:00
Daniel Veillard
a6b14bf9fd Clarify the need to use xmlFreeNode after xmlUnlinkNode
Just add one small sentence to the xmlUnlinkNode function comments
2012-01-26 17:44:35 +08:00
Daniel Veillard
aa54d37cd7 Fix handling of XML-1.0 XML namespace declaration
Usually 'xml' namespace for XML-1.0 declaration does not need
to be carried but Mike Hommey raised the problem that the SVG
XSD file fails to parse due to a mishandling.
- SAX2.c: failure to create a namespace should not be interpreted
  as a memory allocation error
- tree.c: document better xmlNewNs behaviour, and fix it in the
  case the 'xml' prefix is being used.
2010-09-09 18:17:47 +02:00
Daniel Veillard
e4d1849cd8 Fix xmlNodeSetBase() comment 2010-03-09 11:12:30 +01:00
François Delyon
2f70090864 xmlPreviousElementSibling mistake
* tree.c: xmlPreviousElementSibling it should look for preceding sibling
  never for the following ones...
2010-02-03 17:32:37 +01:00
Rob Richards
ddb01cbf61 Fix lost namespace when copying node
* tree.c: reconcile namespace if not found
2010-01-29 13:32:12 -05:00
Martin Trappel
f370310542 Fix a const warning in xmlNodeSetBase
* tree.c: xmlNodeSetName: Remove const from declaration since it is
  used non-const anyway. Remove unnecessary cast on xmlFree later on.
2010-01-22 12:08:00 +01:00
Daniel Veillard
594e5dfb48 Chasing dead assignments reported by clang-scan
* SAX2.c dict.c error.c hash.c nanohttp.c parser.c python/libxml.c
  relaxng.c runtest.c tree.c valid.c xinclude.c xmlregexp.c xmlsave.c
  xmlschemas.c xpath.c xpointer.c: mostly removing unneded affectations,
  but this led to a few real bugs and some part not yet understood
  (relaxng/interleave)
2009-09-07 14:58:47 +02:00
Daniel Veillard
76d364583e Fixing assorted potential problems raised by scan
* encoding.c parser.c relaxng.c runsuite.c tree.c xmlreader.c
  xmlschemas.c: nothing really serious but better safe than sorry
2009-09-07 11:19:33 +02:00
Daniel Veillard
ee20cd7ec9 574017 Realloc too expensive on most platform
* tree.c: even on BSD there is too much of a penalty hit, to use
  the doubling buffer size strategy on all arches not just Windows.
2009-08-22 15:18:31 +02:00
Daniel Veillard
8ed1072c2d Add symbol versioning to libxml2 shared libs
* libxml2.syms: the symbols with history, going back to 2.4.30
* Makefile.am configure.in: linking flags detection and use
* parser.c tree.c valid.c xpointer.c: various cleanup of functions
  which could be made static or simply discarded, not that many
2009-08-20 19:17:36 +02:00
Petr Pajas
2afca4a1c4 Preserve attributes of include start on tree copy
* tree.c: copy attributes and namespaces for that kind of node
2009-07-30 17:47:32 +02:00
Daniel Veillard
ab2a763db8 A bit of cleanups
* tree.c: avoid calling xmlAddID with NULL values
* parser.c: add a few xmlInitParser in some entry points
2009-07-09 08:45:03 +02:00
Daniel Veillard
43bc89c1e3 add a missing check in xmlAddSibling, patch by Kris Breuker avoid
* tree.c: add a missing check in xmlAddSibling, patch by Kris Breuker
* xmlIO.c: avoid xmlAllocOutputBuffer using XML_BUFFER_EXACT which
  leads to performances problems especially on Windows.
daniel

svn path=/trunk/; revision=3820
2009-03-23 19:32:04 +00:00
Rob Richards
810a78b305 set doc on last child tree in xmlAddChildList for bug #546772. Fix problem
* tree.c: set doc on last child tree in xmlAddChildList for 
  bug #546772. Fix problem adding an attribute via with xmlAddChild 
  reported by Kris Breuker.

svn path=/trunk/; revision=3806
2008-12-31 22:13:57 +00:00
Daniel Veillard
be2bd6ac6f adds element traversal support avoid a warning regenerated daniel
* include/libxml/tree.h tree.c python/generator.py: adds
  element traversal support
* valid.c: avoid a warning
* doc/*: regenerated
daniel

svn path=/trunk/; revision=3804
2008-11-27 15:26:28 +00:00
Daniel Veillard
1dc9feb00f fix for CVE-2008-4226, a memory overflow when building gigantic text
* SAX2.c parser.c: fix for CVE-2008-4226, a memory overflow
  when building gigantic text nodes, and a bit of cleanup
  to better handled out of memory problem in that code.
* tree.c: fix for CVE-2008-4225, lack of testing leads to
  a busy loop test assuming one have enough core memory.
Daniel

svn path=/trunk/; revision=3803
2008-11-17 15:59:21 +00:00
Daniel Veillard
da3fee406d Borland C fix from Moritz Both regenerate, workaround a problem for buffer
* trionan.c: Borland C fix from Moritz Both
* testapi.c: regenerate, workaround a problem for buffer testing
* xmlIO.c HTMLtree.c: new internal entry point to hide even better
  xmlAllocOutputBufferInternal
* tree.c: harden the code around buffer allocation schemes
* parser.c: restore the warning when namespace names are not absolute
  URIs
* runxmlconf.c: continue regression tests if we get the expected
  number of errors
* Makefile.am: run the python tests on make check
* xmlsave.c: handle the HTML documents and trees
* python/libxml.c: convert python serialization to the xmlSave APIs
  and avoid some horrible hacks
Daniel

svn path=/trunk/; revision=3790
2008-09-01 13:08:57 +00:00
Daniel Veillard
1572425c27 preparing 2.7.0 release remove some testing traces remove some warnings
* configure.in, doc/*: preparing 2.7.0 release
* tree.c: remove some testing traces
* parser.c xmlIO.c xmlschemas.c: remove some warnings
Daniel

svn path=/trunk/; revision=3788
2008-08-30 15:01:04 +00:00
Daniel Veillard
e83e93e715 make a new kind of buffer where shrinking and adding in head can avoid
* include/libxml/tree.h tree.c: make a new kind of buffer where
  shrinking and adding in head can avoid reallocation or full
  buffer memmoves
* encoding.c xmlIO.c: use the new kind of buffers for output
  buffers
Daniel

svn path=/trunk/; revision=3787
2008-08-30 12:52:26 +00:00
Daniel Veillard
2cba415895 fix a small initialization problem raised by Ashwin increase testing
* threads.c: fix a small initialization problem raised by Ashwin
* testapi.c gentest.py: increase testing especially for document
  with an internal subset, and entities
* tree.c: fix a deallocation issue when unlinking entities from
  a document.
* valid.c: fix a missing entry point test not found previously.
* doc/*: regenerated the APIs, docs etc.
daniel

svn path=/trunk/; revision=3778
2008-08-27 11:45:41 +00:00
Daniel Veillard
aa6de47ebf applied patch from Aswin to fix tree skipping fixed a comment and added a
* xmlreader.c: applied patch from Aswin to fix tree skipping
* include/libxml/entities.h entities.c: fixed a comment and
  added a new xmlNewEntity() entry point
* runtest.c: be less verbose
* tree.c: space and tabs cleanups
daniel

svn path=/trunk/; revision=3774
2008-08-25 14:53:31 +00:00
Daniel Veillard
ae0765b681 more progresses against the official regression tests small cleanup for
* runxmlconf.c: more progresses against the official regression tests
* runsuite.c: small cleanup for non-leak reports
* include/libxml/tree.h: parsing flags and other properties are
  now added to the document node, this is generally useful and
  allow to make Name and NmToken validations based on the parser
  flags, more specifically the 5th edition of XML or not
* HTMLparser.c tree.c: small side effects for the previous changes
* parser.c SAX2.c valid.c: the bulk of teh changes are here,
  the parser and validation behaviour can be affected, parsing
  flags need to be copied, lot of changes. Also fixing various
  validation problems in the regression tests.
Daniel

svn path=/trunk/; revision=3762
2008-07-31 19:54:59 +00:00
Daniel Veillard
ed939f8e06 fix a bug introduced when fixing #438208 and reported by Ashwin fix an
* tree.c: fix a bug introduced when fixing #438208 and reported by
  Ashwin
* python/generator.py: fix an infinite loop bug
Daniel

svn path=/trunk/; revision=3733
2008-04-08 08:20:08 +00:00
Daniel Veillard
8f6c2b1163 fix some problems with the *EatName functions when running out of memory
* tree.c: fix some problems with the *EatName functions when
  running out of memory raised by Eric Schrock , should fix #438208
Daniel

svn path=/trunk/; revision=3729
2008-04-03 11:17:21 +00:00
Daniel Veillard
6f8611fdb4 patch from Julien Charbon to simplify the processing of xmlSetProp()
* include/libxml/xmlerror.h tree.c: patch from Julien Charbon
  to simplify the processing of xmlSetProp()
Daniel

svn path=/trunk/; revision=3694
2008-02-15 08:33:21 +00:00
William M. Brack
38d452ac1c Fixed typo in xmlCharEncFirstLine pointed out by Mark Rowe (bug #440159)
* encoding.c: Fixed typo in xmlCharEncFirstLine pointed out
  by Mark Rowe (bug #440159)
* include/libxml/xmlversion.h.in: Added check for definition of
  _POSIX_C_SOURCE to avoid warnings on Apple OS/X (patch from
  Wendy Doyle and Mark Rowe, bug #346675)
* schematron.c, testapi.c, tree.c, xmlIO.c, xmlsave.c: minor
  changes to fix compilation warnings - no change to logic.

svn path=/trunk/; revision=3618
2007-05-22 16:00:06 +00:00
Daniel Veillard
c9923324e9 Richard Jones reported xmlBufferAdd (buf, "", -1), fixing it Daniel
* tree.c: Richard Jones reported xmlBufferAdd (buf, "", -1), fixing it
Daniel

svn path=/trunk/; revision=3605
2007-04-24 18:12:06 +00:00
Daniel Veillard
0e05f4c2e0 applied documentation patches from Markus Keim fixed one bug and added a
* tree.c: applied documentation patches from Markus Keim
* xmlregexp.c: fixed one bug and added a couple of optimisations
  while working on bug #362989
Daniel
2006-11-01 15:33:04 +00:00
Daniel Veillard
26a45c815a fix comment for xmlDocSetRootElement c.f. #351981 order XPath elements
* tree.c: fix comment for xmlDocSetRootElement c.f. #351981
* xmllint.c: order XPath elements when using --shell
Daniel
2006-10-20 12:55:34 +00:00
Daniel Veillard
b5f1197ce2 fixing bug #344390 with xmlReconciliateNs Daniel
* tree.c: fixing bug #344390 with xmlReconciliateNs
Daniel
2006-10-14 08:46:40 +00:00
Daniel Veillard
f1a27c659e added --html --memory to test htmlReadMemory to test #321632 added various
* xmllint.c: added --html --memory to test htmlReadMemory to
  test #321632
* HTMLparser.c: added various initialization calls which may help
  #321632 but not conclusive
* testapi.c tree.c include/libxml/tree.h: fixed compilation with
  --with-minimum --with-sax1 and --with-minimum --with-schemas
  fixing #326442
Daniel
2006-10-13 22:33:03 +00:00
Daniel Veillard
b8efdda0a3 add a new function xmlPathToUri() to provide a clean conversion when
* uri.c include/libxml/uri.h: add a new function xmlPathToUri()
  to provide a clean conversion when setting up a base
* SAX2.c tree.c: use said function when setting up doc->URL
  or using the xmlSetBase function. Should fix #346261
Daniel
2006-10-10 12:37:14 +00:00
Rob Richards
a02f199d7b xmlTextConcat works with comments and PI nodes (bug #355962). fix
* tree.c: xmlTextConcat works with comments and PI nodes (bug #355962).
* parser.c: fix resulting tree corruption when using XML namespace
  with existing doc in xmlParseBalancedChunkMemoryRecover.
2006-09-16 14:04:26 +00:00
Kasimier T. Buchcik
978039bbd8 Fixed a bug in xmlDOMWrapAdoptNode(); the tree traversal stopped if the
* tree.c include/libxml/tree.h: Fixed a bug in
  xmlDOMWrapAdoptNode(); the tree traversal stopped if the
  very first given node had an attribute node :-( This was due
  to a missed check in the traversal mechanism.
  Expanded the xmlDOMWrapCtxt: it now holds the namespace map
  used in xmlDOMWrapAdoptNode() and xmlDOMWrapCloneNode() for
  reusal; so the map-items don't need to be created for every
  cloning/adoption. Added a callback function to it for
  retrieval of xmlNsPtr to be set on node->ns; this is needed
  for my custom handling of ns-references in my DOM wrapper.
  Substituted code which created the XML namespace decl on
  the doc for a call to xmlTreeEnsureXMLDecl(). Removed
  those nastly "warnigns" from the docs of the clone/adopt
  functions; they work fine on my side.
2006-06-16 19:46:26 +00:00
Kasimier T. Buchcik
43ceb1ec88 Got rid of a compiler warning in xmlGetNodePath().
* tree.c: Got rid of a compiler warning in xmlGetNodePath().
2006-06-12 11:08:18 +00:00
Kasimier T. Buchcik
d38c63f329 Fixed xmlGetNodePath() to generate the node test "*" for elements in the
* tree.c: Fixed xmlGetNodePath() to generate the node test "*"
  for elements in the default namespace, rather than generating
  an unprefixed named node test and loosing the namespace
  information.
2006-06-12 10:58:24 +00:00
Rob Richards
a512d76edc Revert behavior change in xmlSetProp to handle attributes with colons in
* tree.c: Revert behavior change in xmlSetProp to handle attributes
  with colons in name and no namespace.
2006-05-22 11:34:44 +00:00
Daniel Veillard
b2f8f1de7a preparing 2.6.24 release, fixed Python paths at the last moment fix some
* NEWS configure.in doc//*: preparing 2.6.24 release, fixed Python
  paths at the last moment
* relaxng.c testapi.c tree.c: fix some comments
Daniel
2006-04-28 16:30:48 +00:00