libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2026-01-28 10:01:00 +03:00

Author	SHA1	Message	Date
Daniel Veillard	2a1d2422a4	Convert catalog code to the new input buffers Only one place where the buffers fields where accessed directly	2012-07-23 14:24:27 +08:00
Daniel Veillard	53aa293dd3	Convert C14N to the new Input buffer one case of direct access cleaned up	2012-07-23 14:24:27 +08:00
Daniel Veillard	a6a6e70c47	Convert xmlIO.c to the new input and output buffers Relatively mechanical changes, this also led to a couple of fixes upon review of the I/O code on buffer usage.	2012-07-23 14:24:26 +08:00
Daniel Veillard	768eb3b82d	Convert XML parser to the new input buffers The main changes are when the internal of the buffers structure were adressed directly, we now use routines coming from buf.h The routine xmlParserInputRead() which wasn't used anywhere is deprecated too.	2012-07-23 14:24:26 +08:00
Daniel Veillard	65c7d3b2e6	Incompatible change to the Input and Output buffers Since the whole set of structures was public, the only way to switch to size_t clean buffer is to introduce an incompatible API change. Modifying the xmlParserInputBuffer and xmlOutputBuffer structures is the best place to make this change as those structures are deep into the parser feeding data, and no public API suggest to build those manually.	2012-07-23 14:24:26 +08:00
Daniel Veillard	18d0db2503	Adding new encoding function to deal with the new structures * encoding.c: adds xmlCharEncFirstLineInput, xmlCharEncInput and xmlCharEncOutput * enc.h: the functions are not made public but added to this new header	2012-07-23 14:24:26 +08:00
Daniel Veillard	ade10f2c57	Convert XPath to xmlBuf Easy as no buffer was exported in the APIs	2012-07-23 14:24:26 +08:00
Daniel Veillard	bca22f40c3	Adding a new buf module for buffers This also add converter functions between xmlBuf and xmlBuffer * buf.c buf.h: the old xmlBuffer routines but modified for size_t and using xmlBuf instead of xmlBuffer * Makefile.am: add the 2 new files * include/libxml/xmlerror.h: add an entry for the new module * include/libxml/tree.h: expose the xmlBufPtr type but not the structure which stay private	2012-07-23 14:24:26 +08:00
Daniel Veillard	4629ee02ac	Do not fetch external parsed entities Unless explicietely asked for when validating or replacing entities with their value. Problem pointed out by Tom Lane <tgl@redhat.com> * parser.c: do not load external parsed entities unless needed * test/errors/extparsedent.xml result/errors/extparsedent.xml*: add a regression test to avoid change of the behaviour in the future	2012-07-23 14:15:40 +08:00
Aron Xu	baaf03f80f	Fix an error in previous commit	2012-07-20 15:41:34 +08:00
Daniel Veillard	4f9fdc709c	Fix entities local buffers size problems	2012-07-18 17:54:05 +08:00
Daniel Veillard	459eeb9dc7	Fix parser local buffers size problems	2012-07-18 17:54:04 +08:00
Daniel Veillard	740cb1a450	Memory error within SAX2 reuse common framework There is no reason for that class of errors to not use the same handling allowing strctured error processing.	2012-07-18 17:48:32 +08:00
Daniel Veillard	c508fa3f0b	Fix a failure to report xmlreader parsing failures Related to https://bugzilla.gnome.org/show_bug.cgi?id=654567 the problem is that the provided patch failed to raise an error on xmlTextReaderRead() return when an actual parsing error occured	2012-07-18 17:48:06 +08:00
Daniel Veillard	549f06a8bd	Expand .gitignore with more files	2012-07-11 15:21:12 +08:00
Daniel Veillard	8fc913fcc9	Fix compilation on older Visual Studio For https://bugzilla.gnome.org/show_bug.cgi?id=666491 Reported by Matt Budd <matt.budd@gmail.com>, the added support for VS 2010 broke older version 2005 and 2008 because it assumed some of the defines where present in all versions, fix that to check the version of VS	2012-06-06 11:29:29 +08:00
Daniel Veillard	2e1eaca637	Fix xmllint --xpath node initialization By default it's more sensible to initialize it to the document itself than the root element	2012-05-25 16:44:20 +08:00
Daniel Veillard	c943f708f1	Release of libxml2-2.8.0 - Makefile.am: don't package .git - configure.in : update to new release - doc/xml.html: added the new release - doc/* testapi.c: regenerated v2.8.0	2012-05-23 17:10:59 +08:00
Daniel Veillard	22030ef888	Restore code for Windows compilation Try to keep as close to rc1 but still allow the change from Roumen for mingw	2012-05-23 15:52:45 +08:00
Daniel Veillard	ee8f1d4cda	Cleanups before 2.8.0-rc2 new symbols, a missing comment and a fix on symbol release v2.8.0-rc2	2012-05-21 11:16:12 +08:00
Roumen Petrov	978ff224b2	use mingw C99 compatible functions {v}snprintf instead those from MSVC runtime	2012-05-21 10:20:09 +08:00
Daniel Veillard	f27c6683e6	New symbols added for the next release	2012-05-21 10:20:09 +08:00
Daniel Veillard	59df1e4f92	Avoid an extra operation In the catalog code, tsan also complained of testing the variable without locking and that was done a few lines below	2012-05-21 10:19:21 +08:00
Daniel Veillard	d495e6a845	Part for rand_r checking missing Forgot to push that change in previous commit	2012-05-20 20:48:34 +08:00
Daniel Veillard	379ebc1d77	Cleanup on randomization tsan reported that rand() is not thread safe, so create a thread safe wrapper, use rand_r() if available. Consolidate the function, initialization and cleanup in dict.c and make sure it is initialized in xmlInitParser()	2012-05-18 15:41:31 +08:00
Andy Lutomirski	9d9685ad88	xmlTextReader bails too quickly on error For https://bugzilla.gnome.org/show_bug.cgi?id=654567 I use xmlTextReader to parse failed that might be incomplete. These files are the beginning of a well-formed file, but the end is missing so the file as a whole is not well-formed. The problem is that xmlTextReader starts returning errors when it encounters the early EOF, even though I haven't finished reading all of the valid data in the file. It would be helpful if xmlTextReader kept working until the very end. v2.8.0-rc1	2012-05-15 20:10:25 +08:00
Pacho Ramos	1ea6b14125	Fix undefined reference in python module For https://bugzilla.gnome.org/show_bug.cgi?id=622023 when compiled with LDFLAGS="${LDFLAGS} -Wl,-z,-defs -Wl,--no-undefined" the python module would failed due to the undefined. This add an explicit reference to python lib.	2012-05-15 19:36:02 +08:00
Daniel Veillard	0d51cfebc9	Fix a race in xmlNewInputStream For https://bugzilla.gnome.org/show_bug.cgi?id=643148 Reported by Bill Clarke <llib@computer.org>, it used a global variable as a counter for the input id and this was not thread safe. To avoid the race without adding unneeded locking in the parser path, move the id to the parser context instead.	2012-05-15 11:18:40 +08:00
Noam	9313ae8517	Fix weird streaming RelaxNG errors For https://bugzilla.gnome.org/show_bug.cgi?id=512454 The bug was to use compiled determinitic automata when the content model was found to be non-deterministic, leading to random parsing errors.	2012-05-15 11:03:46 +08:00
Daniel Veillard	94431ecba6	Fix various bugs in new code raised by the API checking * testapi.c: regenerated and covering new APIs * tree.c: xmlBufferDetach can't work on immutable buffers * xzlib.c: fix a deallocation error	2012-05-15 10:45:05 +08:00
Daniel Veillard	79ee284abb	Fix various problems with "make dist" * tree.c: missing documentation for xmlBufferDetach * doc/symbols.xml: add two new symbols xmlTextReaderRelaxNGValidateCtxt and xmlBufferDetach * doc/apibuild.py: ignore internal header xzlib.h	2012-05-15 10:25:31 +08:00
Daniel Veillard	9f3cdef08a	Fix a memory leak in the xzlib code The freeing function wasn't called due to a bogus #ifdef surrounding value. Also switch the code to use the normal libxml2 allocation and freeing routines.	2012-05-15 09:38:13 +08:00
Conrad Irwin	7d0d2a50ac	Use a hybrid allocation scheme in xmlNodeSetContent On Fri, May 11, 2012 at 9:10 AM, Daniel Veillard <veillard@redhat.com> wrote: > Hi Conrad, > > that's interesting ! I was initially afraid of a sudden explosion of > memory allocations for building a tree since by default buffers tend to > "waste" memory by using doubling allocations, but that's not the case. > xmllint --noout doc/libxml2-api.xml > when compiled with memory debug produce > > paphio:~/XML -> cat .memdump > MEMORY ALLOCATED : 0, MAX was 12756699 > > and without your patch 12755657, i.e. the increase is minimal. Heh, I thought that too. Actually you're looking at the result with XML_ALLOC_EXACT! This is because EXACT adds 10bytes "spare" on each alloc, and that interestingly wastes about the same amount of space as XML_ALLOC_DOUBLEIT on this example (see below). So it turns out that the default realloc() on my system actually handles this case really well — and I guess that all the time in xmlRealloc() was actually in xmlStrlen, not the underlying realloc() after all (sorry for misleading you). If you replace the realloc() with a bad one (like valgrind's), then the performance degrades severely. This patch implements a HYBRID allocator which has the behaviour you describe (it's like EXACT to start with, though without the spare 10 bytes; and switches to DOUBLEIT after 4kb) — that gets the memory back down to 12755657, with no noticeable impact on the performance of the synthetic pathological example under valgrind. In summary: max_memory on ./xmllint --noout doc/libxml2-api.xml, valgrind time on https://gist.github.com/2656940 max_memory valgrind time before \| 12755657 \| 29:18.2 EXACT \| 12756699 \| 2:58.6 <-- this is the state after the first patch. DOUBLEIT \| 12756727 \| 0:02.7 HYBRID \| 12755754 \| 0:02.7 <-- this is the state with both patches. > > There is also the cost of creating the buffers all the time. > I need to read the code and check but I may be interested in an hybrid > approach where we switch to buffer only when the text node starts to > become too big (4k would remove nearly all usuall types of "document" > usage, i.e. not blocks of data) I tried to avoid too much buffer creation by introducing the xmlBufferDetach function, which allows re-using one buffer to construct many strings. It's maybe a bit of a "hack" in API terms though I thought the gains would be worth it. Conrad ------8<------ To keep memory usage tight in normal conditions it's desirable to only allocate as much space as is needed. Unfortunately this can lead to problems when constructing a long string out of small chunks, because every chunk you add will need to resize the buffer. To fix this XML_ALLOC_HYBRID will switch (when the buffer is 4kb big) from using exact allocations to doubling buffer size every time it is full. This limits the number of buffer resizes to O(log n) (down from O(n)), and thus greatly increases the performance of constructing very large strings in this manner.	2012-05-14 14:18:58 +08:00
Conrad Irwin	7d553f834e	Use buffers when constructing string node lists. Hi Veillard and all, Firstly, thanks for libxml: it's awesome! I noticed recently that libxml was taking a surprisingly long time to perform some operations (many minutes instead of milliseconds), and so I did some digging. It turns out that the problem was caused by the realloc()ing done in xmlNodeAddContentLen() which can be called many (many) times when assigning some content into a node. For background, I'm dealing with XML that contains emails, these can have large attachments (~6MB) which are base-64 encoded, line-wrapped at 78 chars, and each line ends with . This means that xmlNodeAddContentLen() is being called about 200,000 times, and so there are 200,000 reallocs of a 6MB string, which takes a while... (I put a synthetic example of this at https://gist.github.com/2656940) The attached patch works around that problem by using the existing buffer API to merge the strings together before even creating the text node, this keeps the number of realloc()s at a managable level. I'd love feedback on the patch, and am happy to fix problems with it, or explore other solutions if you think that this is barking up the wrong tree :). Thanks, Conrad P.S. Should I create a bug for this too? ------8<------ Before this change xmlStringGetNodeList would perform a realloc() of the entire new content for every XML entity in the assigned text in order to merge together adjacent text nodes. This had the effect of making xmlSetNodeContent O(n^2), which led to unexpectedly bad performance on inputs that contained a large number of XML entities. After this change the memory management is done by the buffer API, avoiding the need to continually re-measure and realloc() the string. For my test data (6MB of 80 character lines, each ending with ) this takes the time to xmlSetNodeContent from about 500 seconds to around 50ms. I have not profiled smaller cases, though I tried to minimize the performance impact of my change by avoiding unnecessary string copying. Signed-off-by: Conrad Irwin <conrad.irwin@gmail.com>	2012-05-14 13:51:30 +08:00
Denis Pauk	a0cd075d94	HTML parser error with <noscript> in the <head> For https://bugzilla.gnome.org/show_bug.cgi?id=615785 When the <noscript> is found, <head> is closed and a <body> element is created. The real <body id="xxx"> gets skipped over, so I can't see any of the body's attributes. Just don't close <head> when encountering a <noscript> Add a regression test too	2012-05-11 19:31:12 +08:00
Remi Gacogne	4609e6c980	XSD: optional element in complex type extension For https://bugzilla.gnome.org/show_bug.cgi?id=609796 Libxml2 fails to validate an instance document against a schema if an element whose type is a complex extension of some base type with an optional child element and that child element is not specified in the instance document. For example, suppose I have some complex type BaseType that is defined to have one child element in a sequence group that has minOccurs set to 0	2012-05-11 15:31:05 +08:00
Daniel Veillard	39d027cdb7	Fix html serialization error and htmlSetMetaEncoding() For https://bugzilla.gnome.org/show_bug.cgi?id=630682 The python tests were reporting errors, some of it was due to a small change in case encoding, but the main one was about htmlSetMetaEncoding(doc, NULL) being broken by not removing the associated meta tag anymore	2012-05-11 12:38:23 +08:00
Daniel Veillard	2c437da7f0	Fix a wrong return value in previous patch	2012-05-11 12:08:15 +08:00
Daniel Veillard	ed35d3d7c3	Fix an uninitialized variable use When compiled without SAX1 support	2012-05-11 10:52:27 +08:00
Brandon Slack	0c7109c81f	Fix a compilation problem with --minimum For https://bugzilla.gnome.org/show_bug.cgi?id=636750 Moved a #endif /* LIBXML_OUTPUT_ENABLED */ a few lines down to avoid reference an undefined variable	2012-05-11 10:50:59 +08:00
Daniel Veillard	399aaba14b	Remove redundant and ungarded include of resolv.h For https://bugzilla.gnome.org/show_bug.cgi?id=617053 This broke the build on Interix-6.0	2012-05-11 10:09:32 +08:00
Christian Dywan	040dcb5995	Remove git error message during configure For https://bugzilla.gnome.org/show_bug.cgi?id=635531 If git is not installed but .git was found configure would emit an error message	2012-05-10 22:55:07 +08:00
Patrick R. Gansterer	023206fc08	xmllint: Build fix for endTimer if !defined(HAVE_GETTIMEOFDAY) For https://bugzilla.gnome.org/show_bug.cgi?id=638649 code was broken !	2012-05-10 22:17:51 +08:00
John Hein	a4fe9b26d3	emove a bashism in confgure.in Not portable, broke on old FreeBSD	2012-05-10 22:12:46 +08:00
Shaun McCance	4cf7325e1f	xinclude with parse="text" does not use the entity loader For https://bugzilla.gnome.org/show_bug.cgi?id=552479 The code for xinclude parse="text" was not using the registered entity loader, defeating attempts to control loading of files.	2012-05-10 20:59:33 +08:00
Denis Pauk	fdf990c2ef	Allow to parse 1 byte HTML files For https://bugzilla.gnome.org/show_bug.cgi?id=605740 File 1 byte long were not accepted by the HTML push parser	2012-05-10 20:40:49 +08:00
Patrick R. Gansterer	204f1f144c	undef ERROR if already defined	2012-05-10 20:24:00 +08:00
Martin Schröder	b91111b475	Patch that fixes the skipping of the HTML_PARSE_NOIMPLIED flag For https://bugzilla.gnome.org/show_bug.cgi?id=642916 I just noticed that the HTML_PARSE_NOIMPLIED flag that you can pass to the HTML-Parser methods doesn't do anything. Its intended purpose is to stop the HTML-parser from forcibly adding a pair of html/body tags if the stream does not contain any. This is highly useful when you don't need this level of strictness. Unfortunately, specifying it doesn't work, because the option is not copied into the parsing context.	2012-05-10 18:52:37 +08:00
Lin Yi-Li	24464be639	Avoid memory leak if xmlParserInputBufferCreateIO fails For https://bugzilla.gnome.org/show_bug.cgi?id=643949 In case of error on an IO creation input the given context is terminated with the given close function, except if the error happened in xmlParserInputBufferCreateIO. This can lead to a resource leak which is fixed by this patch.	2012-05-10 16:14:55 +08:00
Denis Pauk	868d92da89	Add HTML parser support for HTML5 meta charset encoding declaration For https://bugzilla.gnome.org/show_bug.cgi?id=655218 http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#the-meta-element """ The charset attribute specifies the character encoding used by the document. This is a character encoding declaration. If the attribute is present in an XML document, its value must be an ASCII case-insensitive match for the string "UTF-8" (and the document is therefore forced to use UTF-8 as its encoding). """ However, while <meta http-equiv="Content-Type" content="text/html; charset=utf8"> works, <meta charset="utf8"> does not. While libxml2 HTML parser is not tuned for HTML5, this is a simple addition Also added a testcase	2012-05-10 15:34:57 +08:00

1 2 3 4 5 ...

4004 Commits