My attempt to optimize XPath expressions containing '//' caused a
regression reported in bug #695699. This commit disables the
optimization for expressions of the form '//foo[predicate]'.
This patch adds xmlXPathSetContextNode and xmlXPathNodeEval,
which make it easier to evaluation XPath expressions with a
context node other than the document root without poking about
inside the internals of the context.
This patch is compile-tested only, and is my first libxml2
contribution, so please go easy.
Signed-off-by: Alex Bligh <alex@alex.org.uk>
looping 1000 time on an error stating that a nodeset has
grown out of control is useless, make sure we percolate
error up to the various loops and break when errors occurs
I use libxml xpath engine on quite large (and mostly "flat") xml files.
It seems that Shellsort, that is used in xmlXPathNodeSetSort is a
performance bottleneck for my case. I have read some posts about sorting
in libxml in the libxml archive, but I agree that qsort was not the way
to go. I experimented with Timsort instead and my results were good for
me. For about 10000 nodes, my test was about 5x faster with Timsort,
for 1000 nodes about 10% faster, for small data files, the difference
was not measurable.
* timsort.h: the algorithm, kept in a separate header
* xpath.c: plug in the new algorithm in xmlXPathNodeSetSort
* Makefile.am: add the header to the EXTRA_DIST
* doc/apibuild.py: avoid indexing the new header
When investigating the libxslt performance problem reported in bug
#657665, I found that '//' in XPath expressions can be very slow when
working on large subtrees.
One of the reasons is the seemingly quadratic time complexity of the
duplicate checks when merging result nodes. The other is a missed
optimization for expressions of the form
'descendant-or-self::node()/axis::test'. Since '//' is expanded to
'/descendant-or-self::node()/', this type of expression is quite common.
Depending on the axis of the expression following the
'descendant-or-self' step, the following replacements can be made:
from descendant-or-self::node()/child::test
to descendant::test
from descendant-or-self::node()/descendant::test
to descendant::test
from descendant-or-self::node()/self::test
to descendant-or-self::test
from descendant-or-self::node()/descendant-or-self::test
to descendant-or-self::test
'test' can be any kind of node test.
With these replacements the possibly huge result of
'descendant-or-self::node()' doesn't have to be stored temporarily, but
can be processsed in one pass. If the resulting nodeset is small, the
duplicate checks aren't a problem.
I found that there already is a function called
xmlXPathRewriteDOSExpression which performs this optimization for a very
limited set of cases. It employs a complicated iteration scheme for
rewritten expressions. AFAICS, this can be avoided by simply changing
the axis of the expression like described above.
With the attached patch against libxml2 and the files from bug #657665 I
got the following results.
Before:
$ time xsltproc/xsltproc --noout service-names-port-numbers.xsl
service-names-port-numbers.xml
real 2m56.213s
user 2m56.123s
sys 0m0.080s
After:
$ time xsltproc/xsltproc --noout service-names-port-numbers.xsl
service-names-port-numbers.xml
real 0m3.836s
user 0m3.764s
sys 0m0.060s
I also ran the libxml2 and libxslt test suites with the patch and
couldn't detect any breakage.
Nick
>From e0f5a8261760e4f257b90410be27657e984237c8 Mon Sep 17 00:00:00 2001
From: Nick Wellnhofer <wellnhofer@aevum.de>
Date: Sun, 19 Aug 2012 18:20:22 +0200
Subject: [PATCH] Optimizations for descendant-or-self::node()
Currently, the function xmlXPathRewriteDOSExpression optimizes expressions
of type '//child'. Instead of adding a 'rewriteType' and doing a compound
traversal, the same can be achieved simply by setting the axis of the node
test from 'child' to 'descendant'.
There are also many other cases that can be optimized similarly. This
commit augments xmlXPathRewriteDOSExpression to essentially rewrite the
following subexpressions:
- descendant-or-self::node()/child:: to descendant::
- descendant-or-self::node()/descendant:: to descendant::
- descendant-or-self::node()/self:: to descendant-or-self::
- descendant-or-self::node()/descendant-or-self:: to descendant-or-self::
Since the '//' shortcut in XPath is translated to
'/descendant-or-self::node()/', this greatly speeds up expressions using
'//' on large subtrees.
This adds some internal limitationson XPath expression complexity,
and limits at runtime like depth of the stack and maximum size
for nodeset.
* xpath.c: implement the above as well as the maximum Name lenght
The count was incremented before the allocation
and not fixed in case of failure
* xpath.c: corrects a few instances where the available count of some
structure is updated before we know the allocation actually
succeeds
This was introduced in the prevous fix, while preceding-sibling and
following sibling axis are empty for attributes and namespaces,
preceding and following axis should still work based on the parent
element. However the parent element is not available for a namespace
node, so we keep the axis empty in that case.
* SAX2.c dict.c error.c hash.c nanohttp.c parser.c python/libxml.c
relaxng.c runtest.c tree.c valid.c xinclude.c xmlregexp.c xmlsave.c
xmlschemas.c xpath.c xpointer.c: mostly removing unneded affectations,
but this led to a few real bugs and some part not yet understood
(relaxng/interleave)
* HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c
testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c:
fix unused variables, or unneeded increments as well as a couple
of space issues
* runtest.c: check for NULL before calling unlink()
* include/wsockcompat.h win32/Makefile.bcb xpath.c: fixes for
Borland/CodeGear/Embarcadero compilers by Eric Zurcher
Daniel
svn path=/trunk/; revision=3822
* include/libxml/parser.h include/libxml/xmlwriter.h
include/libxml/relaxng.h include/libxml/xmlversion.h.in
include/libxml/xmlwin32version.h.in include/libxml/valid.h
include/libxml/xmlschemas.h include/libxml/xmlerror.h:
port patch from Marcus Meissner to add gcc checking for
printf like functions parameters, should fix#65068
* doc/apibuild.py doc/*: modified the script accordingly
and regenerated
* xpath.c xmlmemory.c threads.c: fix a few warnings
Daniel
svn path=/trunk/; revision=3813
* schematron.c xpath.c: applied a couple of patches from Martin
avoiding some leaks, fixinq QName checks in XPath, XPath debugging
and schematron code cleanups.
* python/tests/Makefile.am python/tests/xpathleak.py: add the
specific regression tests, just tweak it to avoid output by default
Daniel
svn path=/trunk/; revision=3791
* nanohttp.c: fixed problem on gzip streams (bug #438045)
* xpath.c: fixed minor spot of redundant code - no logic change.
svn path=/trunk/; revision=3616
* xpath.c: enhanced the coding for xmlXPathCastNumberToString
in order to produce the required number of significant digits
(bug #437179)
svn path=/trunk/; revision=3615
* xpath.c: fixed xmlXPathCmpNodes for incorrect result on certain
cases when comparing identical nodes (bug 415567) with patch
from Oleg Paraschenko
svn path=/trunk/; revision=3587