libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-10-24 13:33:01 +03:00

Author	SHA1	Message	Date
Conrad Irwin	7d0d2a50ac	Use a hybrid allocation scheme in xmlNodeSetContent On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard@redhat.com> wrote: > Hi Conrad, > > that's interesting ! I was initially afraid of a sudden explosion of > memory allocations for building a tree since by default buffers tend to > "waste" memory by using doubling allocations, but that's not the case. > xmllint --noout doc/libxml2-api.xml > when compiled with memory debug produce > > paphio:~/XML -> cat .memdump > MEMORY ALLOCATED : 0, MAX was 12756699 > > and without your patch 12755657, i.e. the increase is minimal. Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the same amount of space as XML_ALLOC_DOUBLEIT on this example (see below). So it turns out that the default realloc() on my system actually handles this case really well — and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the underlying realloc() after all (sorry for misleading you). If you replace the realloc() with a bad one (like valgrind's), then the performance degrades severely. This patch implements a HYBRID allocator which has the behaviour you describe (it's like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT after 4kb) — that gets the memory back down to 12755657, with no noticeable impact on the performance of the synthetic pathological example under valgrind. In summary: max_memory on ./xmllint --noout doc/libxml2-api.xml, valgrind time on https://gist.github.com/2656940 max_memory valgrind time before \| 12755657 \| 29:18.2 EXACT \| 12756699 \| 2:58.6 <-- this is the state after the first patch. DOUBLEIT \| 12756727 \| 0:02.7 HYBRID \| 12755754 \| 0:02.7 <-- this is the state with both patches. > > There is also the cost of creating the buffers all the time. > I need to read the code and check but I may be interested in an hybrid > approach where we switch to buffer only when the text node starts to > become too big (4k would remove nearly all usuall types of "document" > usage, i.e. not blocks of data) I tried to avoid too much buffer creation by introducing the xmlBufferDetach function, which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack" in API terms though I thought the gains would be worth it. Conrad ------8<------ To keep memory usage tight in normal conditions it's desirable to only allocate as much space as is needed. Unfortunately this can lead to problems when constructing a long string out of small chunks, because every chunk you add will need to resize the buffer. To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big) from using exact allocations to doubling buffer size every time it is full. This limits the number of buffer resizes to O(log n) (down from O(n)), and thus greatly increases the performance of constructing very large strings in this manner.	2012-05-14 14:18:58 +08:00
Conrad Irwin	7d553f834e	Use buffers when constructing string node lists. Hi Veillard and all, Firstly, thanks for libxml: it's awesome! I noticed recently that libxml was taking a surprisingly long time to perform some operations (many minutes instead of milliseconds), and so I did some digging. It turns out that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can be called many (many) times when assigning some content into a node. For background, I'm dealing with XML that contains emails, these can have large attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends with . This means that xmlNodeAddContentLen() is being called about 200,000 times, and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic example of this at https://gist.github.com/2656940) The attached patch works around that problem by using the existing buffer API to merge the strings together before even creating the text node, this keeps the number of realloc()s at a managable level. I'd love feedback on the patch, and am happy to fix problems with it, or explore other solutions if you think that this is barking up the wrong tree :). Thanks, Conrad P.S. Should I create a bug for this too? ------8<------ Before this change xmlStringGetNodeList would perform a realloc() of the entire new content for every XML entity in the assigned text in order to merge together adjacent text nodes. This had the effect of making xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on inputs that contained a large number of XML entities. After this change the memory management is done by the buffer API, avoiding the need to continually re-measure and realloc() the string. For my test data (6MB of 80 character lines, each ending with ) this takes the time to xmlSetNodeContent from about 500 seconds to around 50ms. I have not profiled smaller cases, though I tried to minimize the performance impact of my change by avoiding unnecessary string copying. Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>	2012-05-14 13:51:30 +08:00
Daniel Veillard	39d027cdb7	Fix html serialization error and htmlSetMetaEncoding() For https://bugzilla.gnome.org/show_bug.cgi?id=630682 The python tests were reporting errors, some of it was due to a small change in case encoding, but the main one was about htmlSetMetaEncoding(doc, NULL) being broken by not removing the associated meta tag anymore	2012-05-11 12:38:23 +08:00
Daniel Veillard	a6b14bf9fd	Clarify the need to use xmlFreeNode after xmlUnlinkNode Just add one small sentence to the xmlUnlinkNode function comments	2012-01-26 17:44:35 +08:00
Daniel Veillard	aa54d37cd7	Fix handling of XML-1.0 XML namespace declaration Usually 'xml' namespace for XML-1.0 declaration does not need to be carried but Mike Hommey raised the problem that the SVG XSD file fails to parse due to a mishandling. - SAX2.c: failure to create a namespace should not be interpreted as a memory allocation error - tree.c: document better xmlNewNs behaviour, and fix it in the case the 'xml' prefix is being used.	2010-09-09 18:17:47 +02:00
Daniel Veillard	e4d1849cd8	Fix xmlNodeSetBase() comment	2010-03-09 11:12:30 +01:00
François Delyon	2f70090864	xmlPreviousElementSibling mistake * tree.c: xmlPreviousElementSibling it should look for preceding sibling never for the following ones...	2010-02-03 17:32:37 +01:00
Rob Richards	ddb01cbf61	Fix lost namespace when copying node * tree.c: reconcile namespace if not found	2010-01-29 13:32:12 -05:00
Martin Trappel	f370310542	Fix a const warning in xmlNodeSetBase * tree.c: xmlNodeSetName: Remove const from declaration since it is used non-const anyway. Remove unnecessary cast on xmlFree later on.	2010-01-22 12:08:00 +01:00
Daniel Veillard	594e5dfb48	Chasing dead assignments reported by clang-scan * SAX2.c dict.c error.c hash.c nanohttp.c parser.c python/libxml.c relaxng.c runtest.c tree.c valid.c xinclude.c xmlregexp.c xmlsave.c xmlschemas.c xpath.c xpointer.c: mostly removing unneded affectations, but this led to a few real bugs and some part not yet understood (relaxng/interleave)	2009-09-07 14:58:47 +02:00
Daniel Veillard	76d364583e	Fixing assorted potential problems raised by scan * encoding.c parser.c relaxng.c runsuite.c tree.c xmlreader.c xmlschemas.c: nothing really serious but better safe than sorry	2009-09-07 11:19:33 +02:00
Daniel Veillard	ee20cd7ec9	574017 Realloc too expensive on most platform * tree.c: even on BSD there is too much of a penalty hit, to use the doubling buffer size strategy on all arches not just Windows.	2009-08-22 15:18:31 +02:00
Daniel Veillard	8ed1072c2d	Add symbol versioning to libxml2 shared libs * libxml2.syms: the symbols with history, going back to 2.4.30 * Makefile.am configure.in: linking flags detection and use * parser.c tree.c valid.c xpointer.c: various cleanup of functions which could be made static or simply discarded, not that many	2009-08-20 19:17:36 +02:00
Petr Pajas	2afca4a1c4	Preserve attributes of include start on tree copy * tree.c: copy attributes and namespaces for that kind of node	2009-07-30 17:47:32 +02:00
Daniel Veillard	ab2a763db8	A bit of cleanups * tree.c: avoid calling xmlAddID with NULL values * parser.c: add a few xmlInitParser in some entry points	2009-07-09 08:45:03 +02:00
Daniel Veillard	43bc89c1e3	add a missing check in xmlAddSibling, patch by Kris Breuker avoid * tree.c: add a missing check in xmlAddSibling, patch by Kris Breuker * xmlIO.c: avoid xmlAllocOutputBuffer using XML_BUFFER_EXACT which leads to performances problems especially on Windows. daniel svn path=/trunk/; revision=3820	2009-03-23 19:32:04 +00:00
Rob Richards	810a78b305	set doc on last child tree in xmlAddChildList for bug #546772 . Fix problem * tree.c: set doc on last child tree in xmlAddChildList for bug #546772. Fix problem adding an attribute via with xmlAddChild reported by Kris Breuker. svn path=/trunk/; revision=3806	2008-12-31 22:13:57 +00:00
Daniel Veillard	be2bd6ac6f	adds element traversal support avoid a warning regenerated daniel * include/libxml/tree.h tree.c python/generator.py: adds element traversal support * valid.c: avoid a warning * doc/*: regenerated daniel svn path=/trunk/; revision=3804	2008-11-27 15:26:28 +00:00
Daniel Veillard	1dc9feb00f	fix for CVE-2008-4226, a memory overflow when building gigantic text * SAX2.c parser.c: fix for CVE-2008-4226, a memory overflow when building gigantic text nodes, and a bit of cleanup to better handled out of memory problem in that code. * tree.c: fix for CVE-2008-4225, lack of testing leads to a busy loop test assuming one have enough core memory. Daniel svn path=/trunk/; revision=3803	2008-11-17 15:59:21 +00:00
Daniel Veillard	da3fee406d	Borland C fix from Moritz Both regenerate, workaround a problem for buffer * trionan.c: Borland C fix from Moritz Both * testapi.c: regenerate, workaround a problem for buffer testing * xmlIO.c HTMLtree.c: new internal entry point to hide even better xmlAllocOutputBufferInternal * tree.c: harden the code around buffer allocation schemes * parser.c: restore the warning when namespace names are not absolute URIs * runxmlconf.c: continue regression tests if we get the expected number of errors * Makefile.am: run the python tests on make check * xmlsave.c: handle the HTML documents and trees * python/libxml.c: convert python serialization to the xmlSave APIs and avoid some horrible hacks Daniel svn path=/trunk/; revision=3790	2008-09-01 13:08:57 +00:00
Daniel Veillard	1572425c27	preparing 2.7.0 release remove some testing traces remove some warnings * configure.in, doc/: preparing 2.7.0 release tree.c: remove some testing traces * parser.c xmlIO.c xmlschemas.c: remove some warnings Daniel svn path=/trunk/; revision=3788	2008-08-30 15:01:04 +00:00
Daniel Veillard	e83e93e715	make a new kind of buffer where shrinking and adding in head can avoid * include/libxml/tree.h tree.c: make a new kind of buffer where shrinking and adding in head can avoid reallocation or full buffer memmoves * encoding.c xmlIO.c: use the new kind of buffers for output buffers Daniel svn path=/trunk/; revision=3787	2008-08-30 12:52:26 +00:00
Daniel Veillard	2cba415895	fix a small initialization problem raised by Ashwin increase testing * threads.c: fix a small initialization problem raised by Ashwin * testapi.c gentest.py: increase testing especially for document with an internal subset, and entities * tree.c: fix a deallocation issue when unlinking entities from a document. * valid.c: fix a missing entry point test not found previously. * doc/*: regenerated the APIs, docs etc. daniel svn path=/trunk/; revision=3778	2008-08-27 11:45:41 +00:00
Daniel Veillard	aa6de47ebf	applied patch from Aswin to fix tree skipping fixed a comment and added a * xmlreader.c: applied patch from Aswin to fix tree skipping * include/libxml/entities.h entities.c: fixed a comment and added a new xmlNewEntity() entry point * runtest.c: be less verbose * tree.c: space and tabs cleanups daniel svn path=/trunk/; revision=3774	2008-08-25 14:53:31 +00:00
Daniel Veillard	ae0765b681	more progresses against the official regression tests small cleanup for * runxmlconf.c: more progresses against the official regression tests * runsuite.c: small cleanup for non-leak reports * include/libxml/tree.h: parsing flags and other properties are now added to the document node, this is generally useful and allow to make Name and NmToken validations based on the parser flags, more specifically the 5th edition of XML or not * HTMLparser.c tree.c: small side effects for the previous changes * parser.c SAX2.c valid.c: the bulk of teh changes are here, the parser and validation behaviour can be affected, parsing flags need to be copied, lot of changes. Also fixing various validation problems in the regression tests. Daniel svn path=/trunk/; revision=3762	2008-07-31 19:54:59 +00:00
Daniel Veillard	ed939f8e06	fix a bug introduced when fixing #438208 and reported by Ashwin fix an * tree.c: fix a bug introduced when fixing #438208 and reported by Ashwin * python/generator.py: fix an infinite loop bug Daniel svn path=/trunk/; revision=3733	2008-04-08 08:20:08 +00:00
Daniel Veillard	8f6c2b1163	fix some problems with the EatName functions when running out of memory tree.c: fix some problems with the *EatName functions when running out of memory raised by Eric Schrock , should fix #438208 Daniel svn path=/trunk/; revision=3729	2008-04-03 11:17:21 +00:00
Daniel Veillard	6f8611fdb4	patch from Julien Charbon to simplify the processing of xmlSetProp() * include/libxml/xmlerror.h tree.c: patch from Julien Charbon to simplify the processing of xmlSetProp() Daniel svn path=/trunk/; revision=3694	2008-02-15 08:33:21 +00:00
William M. Brack	38d452ac1c	Fixed typo in xmlCharEncFirstLine pointed out by Mark Rowe (bug #440159 ) * encoding.c: Fixed typo in xmlCharEncFirstLine pointed out by Mark Rowe (bug #440159) * include/libxml/xmlversion.h.in: Added check for definition of _POSIX_C_SOURCE to avoid warnings on Apple OS/X (patch from Wendy Doyle and Mark Rowe, bug #346675) * schematron.c, testapi.c, tree.c, xmlIO.c, xmlsave.c: minor changes to fix compilation warnings - no change to logic. svn path=/trunk/; revision=3618	2007-05-22 16:00:06 +00:00
Daniel Veillard	c9923324e9	Richard Jones reported xmlBufferAdd (buf, "", -1), fixing it Daniel * tree.c: Richard Jones reported xmlBufferAdd (buf, "", -1), fixing it Daniel svn path=/trunk/; revision=3605	2007-04-24 18:12:06 +00:00
Daniel Veillard	0e05f4c2e0	applied documentation patches from Markus Keim fixed one bug and added a * tree.c: applied documentation patches from Markus Keim * xmlregexp.c: fixed one bug and added a couple of optimisations while working on bug #362989 Daniel	2006-11-01 15:33:04 +00:00
Daniel Veillard	26a45c815a	fix comment for xmlDocSetRootElement c.f. #351981 order XPath elements * tree.c: fix comment for xmlDocSetRootElement c.f. #351981 * xmllint.c: order XPath elements when using --shell Daniel	2006-10-20 12:55:34 +00:00
Daniel Veillard	b5f1197ce2	fixing bug #344390 with xmlReconciliateNs Daniel * tree.c: fixing bug #344390 with xmlReconciliateNs Daniel	2006-10-14 08:46:40 +00:00
Daniel Veillard	f1a27c659e	added --html --memory to test htmlReadMemory to test #321632 added various * xmllint.c: added --html --memory to test htmlReadMemory to test #321632 * HTMLparser.c: added various initialization calls which may help #321632 but not conclusive * testapi.c tree.c include/libxml/tree.h: fixed compilation with --with-minimum --with-sax1 and --with-minimum --with-schemas fixing #326442 Daniel	2006-10-13 22:33:03 +00:00
Daniel Veillard	b8efdda0a3	add a new function xmlPathToUri() to provide a clean conversion when * uri.c include/libxml/uri.h: add a new function xmlPathToUri() to provide a clean conversion when setting up a base * SAX2.c tree.c: use said function when setting up doc->URL or using the xmlSetBase function. Should fix #346261 Daniel	2006-10-10 12:37:14 +00:00
Rob Richards	a02f199d7b	xmlTextConcat works with comments and PI nodes (bug #355962 ). fix * tree.c: xmlTextConcat works with comments and PI nodes (bug #355962). * parser.c: fix resulting tree corruption when using XML namespace with existing doc in xmlParseBalancedChunkMemoryRecover.	2006-09-16 14:04:26 +00:00
Kasimier T. Buchcik	978039bbd8	Fixed a bug in xmlDOMWrapAdoptNode(); the tree traversal stopped if the * tree.c include/libxml/tree.h: Fixed a bug in xmlDOMWrapAdoptNode(); the tree traversal stopped if the very first given node had an attribute node :-( This was due to a missed check in the traversal mechanism. Expanded the xmlDOMWrapCtxt: it now holds the namespace map used in xmlDOMWrapAdoptNode() and xmlDOMWrapCloneNode() for reusal; so the map-items don't need to be created for every cloning/adoption. Added a callback function to it for retrieval of xmlNsPtr to be set on node->ns; this is needed for my custom handling of ns-references in my DOM wrapper. Substituted code which created the XML namespace decl on the doc for a call to xmlTreeEnsureXMLDecl(). Removed those nastly "warnigns" from the docs of the clone/adopt functions; they work fine on my side.	2006-06-16 19:46:26 +00:00
Kasimier T. Buchcik	43ceb1ec88	Got rid of a compiler warning in xmlGetNodePath(). * tree.c: Got rid of a compiler warning in xmlGetNodePath().	2006-06-12 11:08:18 +00:00
Kasimier T. Buchcik	d38c63f329	Fixed xmlGetNodePath() to generate the node test "" for elements in the tree.c: Fixed xmlGetNodePath() to generate the node test "*" for elements in the default namespace, rather than generating an unprefixed named node test and loosing the namespace information.	2006-06-12 10:58:24 +00:00
Rob Richards	a512d76edc	Revert behavior change in xmlSetProp to handle attributes with colons in * tree.c: Revert behavior change in xmlSetProp to handle attributes with colons in name and no namespace.	2006-05-22 11:34:44 +00:00
Daniel Veillard	b2f8f1de7a	preparing 2.6.24 release, fixed Python paths at the last moment fix some * NEWS configure.in doc//: preparing 2.6.24 release, fixed Python paths at the last moment relaxng.c testapi.c tree.c: fix some comments Daniel	2006-04-28 16:30:48 +00:00
Daniel Veillard	973dceb768	fix compilation without tree Daniel * tree.c: fix compilation without tree Daniel	2006-04-25 20:22:20 +00:00
Daniel Veillard	11ce4004d8	end of first pass on coverity reports. Daniel * runtest.c schematron.c testAutomata.c tree.c valid.c xinclude.c xmlcatalog.c xmlreader.c xmlregexp.c xpath.c: end of first pass on coverity reports. Daniel	2006-03-10 00:36:23 +00:00
Kasimier T. Buchcik	4435341da4	Simplified usage of the internal xmlNsMap. Added a "strict" lookup for * tree.c: Simplified usage of the internal xmlNsMap. Added a "strict" lookup for namespaces based on a prefix. Fixed a namespace processing issue in the clone-node function, which occured if a @ctxt argument was given.	2006-03-06 13:26:16 +00:00
Kasimier T. Buchcik	30f874d7e6	Bundled lookup of attr-nodes and retrieving their values into the * tree.c: Bundled lookup of attr-nodes and retrieving their values into the functions xmlGetPropNodeInternal() and xmlGetPropNodeValueInternal(). Changed relevant code to use those functions.	2006-03-02 18:04:29 +00:00
Rob Richards	6581512a0c	Fix the add sibling functions when passing attributes. Modify testing for * tree.c: Fix the add sibling functions when passing attributes. Modify testing for ID in xmlSetProp. No longer remove IDness when unlinking or replacing an attribute.	2006-02-25 17:13:33 +00:00
Kasimier T. Buchcik	eb46870850	Fixed bug #328896 reported by Liron. The path for text- and * tree.c: Fixed bug #328896 reported by Liron. The path for text- and CDATA-section-nodes was computed incorrectly in xmlGetNodePath().	2006-02-15 10:57:50 +00:00
Kasimier T. Buchcik	cab801b163	Added an initial version of xmlDOMWrapCloneNode() to the API. It will be * tree.c: Added an initial version of xmlDOMWrapCloneNode() to the API. It will be used to reflect DOM's Node.cloneNode and Document.importNode methods. The pros: 1) non-recursive, 2) optimized ns-lookup (mostly pointer comparison), 3) user defined ns-lookup, 4) save ns-processing. The function is in an unfinished and experimental state and should be only used to test it.	2006-02-03 16:35:27 +00:00
Kasimier T. Buchcik	e8f8d75166	Fixed some bugs xmlDOMWrapReconcileNamespaces() wrt the previous addition * tree.c: Fixed some bugs xmlDOMWrapReconcileNamespaces() wrt the previous addition of the removal of redundant ns-decls.	2006-02-02 12:13:07 +00:00
Kasimier T. Buchcik	e01b2fd776	Enhanced xmlDOMWrapReconcileNamespaces() to remove redundant ns-decls if * tree.c: Enhanced xmlDOMWrapReconcileNamespaces() to remove redundant ns-decls if the option XML_DOM_RECONNS_REMOVEREDUND was given. Note that I haven't moved this option to the header file yet; so just call this function with an @option of 1 to test the behaviour.	2006-02-01 16:36:13 +00:00

1 2 3 4 5 ...

460 Commits