diff --git a/doc/API.html b/doc/API.html index 4f411905..2eac1991 100644 --- a/doc/API.html +++ b/doc/API.html @@ -36,6 +36,8 @@ A:link, A:visited, A:active { text-decoration: underline }
  • News
  • The xsltproc tool
  • The programming API
  • +
  • Library internals
  • +
  • Writing extensions
  • Contributions
  • flat page, stylesheet diff --git a/doc/bugs.html b/doc/bugs.html index 15c13ab7..40a016d3 100644 --- a/doc/bugs.html +++ b/doc/bugs.html @@ -36,6 +36,8 @@ A:link, A:visited, A:active { text-decoration: underline }
  • News
  • The xsltproc tool
  • The programming API
  • +
  • Library internals
  • +
  • Writing extensions
  • Contributions
  • flat page, stylesheet diff --git a/doc/contribs.html b/doc/contribs.html index 4c279b5e..361eeb5b 100644 --- a/doc/contribs.html +++ b/doc/contribs.html @@ -36,6 +36,8 @@ A:link, A:visited, A:active { text-decoration: underline }
  • News
  • The xsltproc tool
  • The programming API
  • +
  • Library internals
  • +
  • Writing extensions
  • Contributions
  • flat page, stylesheet diff --git a/doc/docs.html b/doc/docs.html index 9dab8ed1..320c3e34 100644 --- a/doc/docs.html +++ b/doc/docs.html @@ -36,6 +36,8 @@ A:link, A:visited, A:active { text-decoration: underline }
  • News
  • The xsltproc tool
  • The programming API
  • +
  • Library internals
  • +
  • Writing extensions
  • Contributions
  • flat page, stylesheet diff --git a/doc/downloads.html b/doc/downloads.html index 0f0ac609..bdc3a44f 100644 --- a/doc/downloads.html +++ b/doc/downloads.html @@ -36,6 +36,8 @@ A:link, A:visited, A:active { text-decoration: underline }
  • News
  • The xsltproc tool
  • The programming API
  • +
  • Library internals
  • +
  • Writing extensions
  • Contributions
  • flat page, stylesheet diff --git a/doc/extensions.html b/doc/extensions.html index 29d5ef0f..32ef673e 100644 --- a/doc/extensions.html +++ b/doc/extensions.html @@ -1,129 +1,144 @@ - + - Writing extensions for XSLT C library for Gnome - - + + +Writing extensions - - -

    - -

    Writing extensions for XSLT C library for Gnome

    - -

    - -

    Location: http://xmlsoft.org/XSLT/extensions.html

    - -

    Libxslt home page: http://xmlsoft.org/XSLT/

    - -

    mailing-list archives: http://mail.gnome.org/archives/xslt/

    - -

    Version: $Revision$

    - -

    Table of content

    + + + + +
    +Gnome LogoRed Hat Logo +
    +

    The XSLT C library for Gnome

    +

    Writing extensions

    +
    +
    + + +
    + + + +
    Main Menu
    + + + +
    Related links
    +
    +

    Table of content

    - -

    Introduction

    - +

    Introduction

    This document describes the work needed to write extensions to the -standard XSLT library for use with libxslt, the XSLT C library developped for the Gnome project.

    - +standard XSLT library for use with libxslt, the XSLT C library developped for the Gnome project.

    Before starting reading this document it is highly recommended to get familiar with the libxslt internals.

    -

    Note: this documentation is by definition incomplete and I am not good at -spelling, grammar, so patches and suggestions are really welcome.

    - -

    Basics

    - +spelling, grammar, so patches and suggestions are really welcome.

    +

    Basics

    The XSLT specification provides two ways to extend an XSLT engine:

    -

    In both cases the extensions need to be associated to a new namespace, i.e. an URI used as the name for the extension's namespace (there is no need to have a resource there for this to work).

    -

    libxslt provides a few extensions itself, either in libxslt namespace -"http://xmlsoft.org/XSLT/" or in other namespace for well known extensions +"http://xmlsoft.org/XSLT/" or in other namespace for well known extensions provided by other XSLT processors like Saxon, Xalan or XT.

    - -

    Extension modules

    - +

    Extension modules

    Since extensions are bound to a namespace name, usually sets of extensions coming from a given source are using the same namespace name defining in practice a group of extensions providing elements, functions or both. From -libxslt point of view those are considered as an "extension module", and most +libxslt point of view those are considered as an "extension module", and most of the APIs work at a module point of view.

    -

    Registration of new functions or elements are bound to the activation of the module, this is currently done by declaring the namespace as an extension by using the attribute extension-element-prefixes on the xsl:stylesheet element.

    -

    And extension module is defined by 3 objects:

      -
    • the namespace name associated
    • -
    • an initialization function
    • -
    • a shutdown function
    • +
    • the namespace name associated
    • +
    • an initialization function
    • +
    • a shutdown function
    - -

    Registering a module

    - +

    Registering a module

    Currently a libxslt module has to be compiled within the application using libxslt, there is no code to load dynamically shared libraries associated to namespace (this may be added but is likely to become a portability nightmare).

    -

    So the current way to register a module is to link the code implementing it with the application and to call a registration function:

    int xsltRegisterExtModule(const xmlChar *URI,
                               xsltExtInitFunction initFunc,
                               xsltExtShutdownFunction shutdownFunc);
    -

    The associated header is read by:

    #include<libxslt/extensions.h>
    -

    which also defines the type for the initialization and shutdown functions

    - -

    Loading a module

    - +

    Loading a module

    Once the module URI has been registered and if the XSLT processor detects that a given stylesheet needs the functionalities of an extended module, this one is initialized.

    -

    The xsltExtInitFunction type defines the interface for an initialization function:

    /**
    @@ -139,46 +154,39 @@ function:

    */ typedef void *(*xsltExtInitFunction)(xsltTransformContextPtr ctxt, const xmlChar *URI);
    -

    There are 3 things to notice:

      -
    • the function gets passed the namespace name URI as an argument, this +
    • the function gets passed the namespace name URI as an argument, this allow a single function to provide the initialization for multiple logical modules
    • -
    • it also gets passed a transformation context, the initialization is +
    • it also gets passed a transformation context, the initialization is done at run time before any processing occurs on the stylesheet but it will be invoked separately each time for each transformation
    • -
    • it returns a pointer, this can be used to store module specific +
    • it returns a pointer, this can be used to store module specific informations which can be retrieved later when a function or an element from the extension are used, an obvious example is a connection to a database which should be kept and reused along the transformation. NULL is a perfectly valid return, there is no way to indicate a failure at this level
    -

    What this function is expected to do is:

      -
    • prepare the context for this module (like opening the database +
    • prepare the context for this module (like opening the database connection)
    • -
    • register the extensions specific to this module
    • +
    • register the extensions specific to this module
    - -

    Registering an extension function

    - +

    Registering an extension function

    There is a single call to do this registration:

    int xsltRegisterExtFunction(xsltTransformContextPtr ctxt,
                                 const xmlChar *name,
                                 const xmlChar *URI,
                                 xmlXPathEvalFunc function);
    -

    The registration is bound to a single transformation instance referred by ctxt, name is the UTF8 encoded name for the NCName of the function, and URI is the namespace name for the extension (no checking is done, a module could register functions or elements from a different namespace, but it is not recommended).

    - -

    Implementing an extension function

    - +

    Implementing an extension function

    The implementation of the function must have the signature of a libxml XPath function:

    /**
    @@ -192,21 +200,18 @@ XPath function:

    typedef void (*xmlXPathEvalFunc)(xmlXPathParserContextPtr ctxt, int nargs);
    - -

    The context passed to an XPath function is not an XSLT context but an XPath context. However it is possible to +

    The context passed to an XPath function is not an XSLT context but an XPath context. However it is possible to find one from the other:

      -
    • The function xsltXPathGetTransformContext provide this lookup facility: +
    • The function xsltXPathGetTransformContext provide this lookup facility:
      xsltTransformContextPtr
                xsltXPathGetTransformContext
                                 (xmlXPathParserContextPtr ctxt);
      -
    • -
    • The xmlXPathContextPtr associated to an +
    • +
    • The xmlXPathContextPtr associated to an xsltTransformContext is stored in the xpathCtxt field.
    -

    The first thing an extension function may want to do is to check the arguments passed on the stack, the nargs will precise how many of them were provided on the XPath expression. The macros valuePop will @@ -215,10 +220,8 @@ extract them from the XPath stack:

    #include <libxml/xpathInternals.h> xmlXPathObjectPtr obj = valuePop(ctxt); -

    Note that ctxt is the XPath context not the XSLT one. It is -then possible to examine the content of the value. Check the description of XPath objects if +then possible to examine the content of the value. Check the description of XPath objects if necessary. The following is a common sequcnce checking whether the argument passed is a string and converting it using the built-in XPath string() function if this is not the case:

    @@ -227,30 +230,26 @@ passed is a string and converting it using the built-in XPath xmlXPathStringFunction(ctxt, 1); obj = valuePop(ctxt); } -

    Most common XPath functions are available directly at the C level and are exported either in <libxml/xpath.h> or in <libxml/xpathInternals.h>.

    -

    The extension function may also need to retrieve the data associated to this module instance (the database connection in the previous example) this can be done using the xsltGetExtData:

    void * xsltGetExtData(xsltTransformContextPtr ctxt,
                           const xmlChar *URI);
    -

    again the URI to be provided is the one used which was used when registering the module.

    -

    Once the function finishes, don't forget to:

      -
    • push the return value on the stack using valuePush(ctxt, - obj)
    • -
    • deallocate the parameters passed to the function using - xmlXPathFreeObject(obj)
    • +
    • push the return value on the stack using valuePush(ctxt, + obj) +
    • +
    • deallocate the parameters passed to the function using + xmlXPathFreeObject(obj) +
    - -

    Examples for extension functions

    - +

    Examples for extension functions

    The module libxslt/functions.c containsthe sources of the XSLT built-in functions, including document(), key(), generate-id(), etc. as well as a full example module at the end. Here is the test function implementation for the @@ -271,40 +270,34 @@ xsltExtFunctionTest(xmlXPathParserContextPtr ctxt, int nargs) tctxt = xsltXPathGetTransformContext(ctxt); if (tctxt == NULL) { xsltGenericError(xsltGenericErrorContext, - "xsltExtFunctionTest: failed to get the transformation context\n"); + "xsltExtFunctionTest: failed to get the transformation context\n"); return; } data = xsltGetExtData(tctxt, (const xmlChar *) XSLT_DEFAULT_URL); if (data == NULL) { xsltGenericError(xsltGenericErrorContext, - "xsltExtFunctionTest: failed to get module data\n"); + "xsltExtFunctionTest: failed to get module data\n"); return; } #ifdef WITH_XSLT_DEBUG_FUNCTION xsltGenericDebug(xsltGenericDebugContext, - "libxslt:test() called with %d args\n", nargs); + "libxslt:test() called with %d args\n", nargs); #endif } - -

    Registering an extension function

    - +

    Registering an extension function

    There is a single call to do this registration:

    int xsltRegisterExtElement(xsltTransformContextPtr ctxt,
                                const xmlChar *name,
                                const xmlChar *URI,
                                xsltTransformFunction function);
    -

    It is similar to the mechanism used to register an extension function, except that the signature of an extension element implementation is different.

    -

    The registration is bound to a single transformation instance referred by ctxt, name is the UTF8 encoded name for the NCName of the element, and URI is the namespace name for the extension (no checking is done, a module could register elements for a different namespace, but it is not recommended).

    - -

    Implementing an extension element

    - +

    Implementing an extension element

    The implementation of the element must have the signature of an XSLT transformation function:

    /** 
    @@ -322,34 +315,27 @@ typedef void (*xsltTransformFunction)
                                xmlNodePtr node,
                                xmlNodePtr inst,
                                xsltStylePreCompPtr comp);
    -

    The first argument is the XSLT transformation context. The second and -third arguments are xmlNodePtr i.e. internal memory representation of XML nodes. They are +third arguments are xmlNodePtr i.e. internal memory representation of XML nodes. They are respectively node from the the input document being transformed by the stylesheet and inst the extension element in the stylesheet. The last argument is comp a pointer to a precompiled representation of inst but usually for extension function this value is NULL by default (it could be added and associated to the instruction in inst->_private).

    -

    The same functions are available from a function implementing an extension element as in an extension function, including xsltGetExtData().

    -

    The goal of extension element being usually to enrich the generated output, it is expected that they will grow the currently generated output tree, this can be done by grabbing ctxt->insert which is the current libxml node being generated (Note this can also be the intermediate value tree being built for example to initialize a variable, the processing should -be similar). The functions for libxml tree manipulation from <libxml/tree.h> can +be similar). The functions for libxml tree manipulation from <libxml/tree.h> can be employed to extend or modify the tree, but it is required to preserve the insertion node and its ancestors since there is existing pointers to those elements still in use in the XSLT template execution stack.

    - -

    Example for extension elements

    - +

    Example for extension elements

    The module libxslt/transform.c containsthe sources of the XSLT built-in elements, including xsl:element, xsl:attribute, xsl:if, etc. There is a small but full example in functions.c providing the implementation for the @@ -372,32 +358,30 @@ xsltExtElementTest(xsltTransformContextPtr ctxt, xmlNodePtr node, if (ctxt == NULL) { xsltGenericError(xsltGenericErrorContext, - "xsltExtElementTest: no transformation context\n"); + "xsltExtElementTest: no transformation context\n"); return; } if (node == NULL) { xsltGenericError(xsltGenericErrorContext, - "xsltExtElementTest: no current node\n"); + "xsltExtElementTest: no current node\n"); return; } if (inst == NULL) { xsltGenericError(xsltGenericErrorContext, - "xsltExtElementTest: no instruction\n"); + "xsltExtElementTest: no instruction\n"); return; } if (ctxt->insert == NULL) { xsltGenericError(xsltGenericErrorContext, - "xsltExtElementTest: no insertion point\n"); + "xsltExtElementTest: no insertion point\n"); return; } comment = xmlNewComment((const xmlChar *) - "libxslt:test element test worked"); + "libxslt:test element test worked"); xmlAddChild(ctxt->insert, comment); } - -

    The shutdown of a module

    - +

    The shutdown of a module

    When the XSLT processor ends a transformation, the shutdown function (if it exists) of all the modules initialized are called.The xsltExtShutdownFunction type defines the interface for a shutdown @@ -413,30 +397,24 @@ function:

    typedef void (*xsltExtShutdownFunction) (xsltTransformContextPtr ctxt, const xmlChar *URI, void *data); -

    this is really similar to a module initialization function except a third argument is passed, it's the value that was returned by the initialization function. This allow to deallocate resources from the module for example close the connection to the database to keep the same example.

    - -

    Future work

    - +

    Future work

    Well some of the pieces missing:

      -
    • a way to load shared libraries to instanciate new modules
    • -
    • a better detection of extension function usage and their registration +
    • a way to load shared libraries to instanciate new modules
    • +
    • a better detection of extension function usage and their registration without having to use the extension prefix which ought to be reserved to element extensions.
    • -
    • more examples
    • -
    • implementations of the EXSLT common - extension libraries, I probably won't have the time needed to do this but - this would be a great contribution. -

      -
    • +
    • more examples
    • +
    • implementations of the EXSLT common + extension libraries, Thomas Broyer nearly finished implementing them.
    - +

    Daniel Veillard

    - -

    $Id$

    +
    diff --git a/doc/help.html b/doc/help.html index 8d0d783c..325df3db 100644 --- a/doc/help.html +++ b/doc/help.html @@ -36,6 +36,8 @@ A:link, A:visited, A:active { text-decoration: underline }
  • News
  • The xsltproc tool
  • The programming API
  • +
  • Library internals
  • +
  • Writing extensions
  • Contributions
  • flat page, stylesheet diff --git a/doc/index.html b/doc/index.html index f49e383d..77f96c3f 100644 --- a/doc/index.html +++ b/doc/index.html @@ -36,6 +36,8 @@ A:link, A:visited, A:active { text-decoration: underline }
  • News
  • The xsltproc tool
  • The programming API
  • +
  • Library internals
  • +
  • Writing extensions
  • Contributions
  • flat page, stylesheet @@ -58,19 +60,19 @@ A:link, A:visited, A:active { text-decoration: underline }
    -

    Libxslt is the XSLT C library developped for the Gnome project. XSLT -itself is a an XML language to define transformation for XML. Libxslt is -based on libxml2 the XML C library developped for the Gnome project.

    +

    Libxslt is the XSLT C library +developped for the Gnome project. XSLT itself is a an XML language to define +transformation for XML. Libxslt is based on libxml2 the XML C library developped for the +Gnome project. It also implements most of the EXSLT set of extensions +functions and some of Saxon's evaluate and expressions extensions.

    People can either embed the library in their application or use xsltproc -the command line processing tool.

    +the command line processing tool. This library is free software and can be +reused in commercial applications (see the intro)

    External documents:

    diff --git a/doc/internals.html b/doc/internals.html index 1d0d882d..ab3cbfc1 100644 --- a/doc/internals.html +++ b/doc/internals.html @@ -1,186 +1,188 @@ - The XSLT C library for Gnome explained - - + + +Library internals - - -

    - -

    The XSLT C library for Gnome explained

    - -

    How does it work ?

    - -

    - -

    Location: http://xmlsoft.org/XSLT/internals.html

    - -

    Libxslt home page: http://xmlsoft.org/XSLT/

    - -

    mailing-list archives: http://mail.gnome.org/archives/xslt/

    - -

    Version: $Revision$

    - -

    Table of contents

    + + + + +
    +Gnome LogoRed Hat Logo +
    +

    The XSLT C library for Gnome

    +

    Library internals

    +
    +
    + + +
    + + + +
    Main Menu
    + + + +
    Related links
    +
    +

    Table of contents

    - -

    Introduction

    - -

    This document describes the processing of libxslt, the XSLT C library developed for the Gnome project.

    - +

    Introduction

    +

    This document describes the processing of libxslt, the XSLT C library developed for the Gnome project.

    Note: this documentation is by definition incomplete and I am not good at -spelling, grammar, so patches and suggestions are really welcome.

    - -

    Basics

    - +spelling, grammar, so patches and suggestions are really welcome.

    +

    Basics

    XSLT is a transformation language. It takes an input document and a stylesheet document and generates an output document:

    - -

    - -

    Libxslt is written in C. It relies on libxml, - the XML C library for Gnome, for the following operations:

    +

    the XSLT processing model

    +

    Libxslt is written in C. It relies on libxml, the XML C library for Gnome, for +the following operations:

      -
    • parsing files
    • -
    • building the in-memory DOM structure associated with the documents - handled
    • -
    • the XPath implementation
    • -
    • serializing back the result document to XML and HTML. (Text is handled +
    • parsing files
    • +
    • building the in-memory DOM structure associated with the documents + handled
    • +
    • the XPath implementation
    • +
    • serializing back the result document to XML and HTML. (Text is handled directly.)
    - -

    Keep it simple stupid

    - +

    Keep it simple stupid

    Libxslt is not very specialized. It is built under the assumption that all -nodes from the source and output document can fit in the virtual memory of the -system. There is a big trade-off there. It is fine for reasonably sized +nodes from the source and output document can fit in the virtual memory of +the system. There is a big trade-off there. It is fine for reasonably sized documents but may not be suitable for large sets of data. The gain is that it can be used in a relatively versatile way. The input or output may never be -serialized, but the size of documents it can handle are limited by the size of -the memory available.

    - -

    More specialized memory handling approaches are possible, like building the -input tree from a serialization progressively as it is consumed, factoring -repetitive patterns, or even on-the-fly generation of the output as the input -is parsed but it is possible only for a limited subset of the stylesheets. In -general the implementation of libxslt follows the following pattern:

    +serialized, but the size of documents it can handle are limited by the size +of the memory available.

    +

    More specialized memory handling approaches are possible, like building +the input tree from a serialization progressively as it is consumed, +factoring repetitive patterns, or even on-the-fly generation of the output as +the input is parsed but it is possible only for a limited subset of the +stylesheets. In general the implementation of libxslt follows the following +pattern:

      -
    • KISS (keep it simple stupid)
    • -
    • when there is a clear bottleneck optimize on top of this simple +
    • KISS (keep it simple stupid)
    • +
    • when there is a clear bottleneck optimize on top of this simple framework and refine only as much as is needed to reach the expected result
    -

    The result is not that bad, clearly one can do a better job but more -specialized too. Most optimization like building the tree on-demand would need -serious changes to the libxml XPath framework. An easy step would be to +specialized too. Most optimization like building the tree on-demand would +need serious changes to the libxml XPath framework. An easy step would be to serialize the output directly (or call a set of SAX-like output handler to keep this a flexible interface) and hence avoid the memory consumption of the result.

    - -

    The libxml nodes

    - -

    DOM-like trees, as used and generated by libxml and libxslt, are relatively -complex. Most node types follow the given structure except a few variations -depending on the node type:

    - +

    The libxml nodes

    +

    DOM-like trees, as used and generated by libxml and libxslt, are +relatively complex. Most node types follow the given structure except a few +variations depending on the node type:

    description of a libxml node

    -

    Nodes carry a name and the node type indicates the kind of node it represents, the most common ones are:

      -
    • document nodes
    • -
    • element nodes
    • -
    • text nodes
    • +
    • document nodes
    • +
    • element nodes
    • +
    • text nodes
    -

    For the XSLT processing, entity nodes should not be generated (i.e. they should be replaced by their content). Most nodes also contains the following -"navigation" informations:

    +"navigation" informations:

      -
    • the containing document
    • -
    • the parent node
    • -
    • the first children node
    • -
    • the last children node
    • -
    • the previous sibling
    • -
    • the following sibling (next)
    • +
    • the containing document
    • +
    • the parent node
    • +
    • the first children node
    • +
    • the last children node
    • +
    • the previous sibling
    • +
    • the following sibling (next)
    -

    Elements nodes carries the list of attributes in the properties, an attribute itself holds the navigation pointers and the children list (the attribute value is not represented as a simple string to allow usage of entities references).

    -

    The ns points to the namespace declaration for the -namespace associated to the node, nsDef is the linked list of -namespace declaration present on element nodes.

    - +namespace associated to the node, nsDef is the linked list +of namespace declaration present on element nodes.

    Most nodes also carry an _private pointer which can be used by the application to hold specific data on this node.

    - -

    The XSLT processing steps

    - +

    The XSLT processing steps

    There are a few steps which are clearly decoupled at the interface level:

      -
    1. parse the stylesheet and generate a DOM tree
    2. -
    3. take the stylesheet tree and build a compiled version of it (the +
    4. parse the stylesheet and generate a DOM tree
    5. +
    6. take the stylesheet tree and build a compiled version of it (the compilation phase)
    7. -
    8. take the input and generate a DOM tree
    9. -
    10. process the stylesheet against the input tree and generate an output +
    11. take the input and generate a DOM tree
    12. +
    13. process the stylesheet against the input tree and generate an output tree
    14. -
    15. serialize the output tree
    16. +
    17. serialize the output tree
    -

    A few things should be noted here:

      -
    • the steps 1/ 3/ and 5/ are optional
    • -
    • the stylesheet obtained at 2/ can be reused by multiple processing 4/ +
    • the steps 1/ 3/ and 5/ are optional
    • +
    • the stylesheet obtained at 2/ can be reused by multiple processing 4/ (and this should also work in threaded programs)
    • -
    • the tree provided in 2/ should never be freed using xmlFreeDoc, but by +
    • the tree provided in 2/ should never be freed using xmlFreeDoc, but by freeing the stylesheet.
    • -
    • the input tree 4/ is not modified except the _private field which may be - used for labelling keys if used by the stylesheet
    • +
    • the input tree 4/ is not modified except the _private field which may + be used for labelling keys if used by the stylesheet
    - -

    The XSLT stylesheet compilation

    - +

    The XSLT stylesheet compilation

    This is the second step described. It takes a stylesheet tree, and -"compiles" it. This associates to each node a structure stored in the +"compiles" it. This associates to each node a structure stored in the _private field and containing information computed in the stylesheet:

    - -

    - +

    a compiled XSLT stylesheet

    One xsltStylesheet structure is generated per document parsed for the stylesheet. XSLT documents allow includes and imports of other documents, imports are stored in the imports list (hence keeping the @@ -188,121 +190,103 @@ tree hierarchy of includes which is very important for a proper XSLT processing model) and includes are stored in the doclist list. An imported stylesheet has a parent link to allow browsing of the tree.

    -

    The DOM tree associated to the document is stored in doc. It is preprocessed to remove ignorable empty nodes and all the nodes in the XSLT namespace are subject to precomputing. This usually consist of extracting all the context information from the context tree (attributes, namespaces, XPath expressions), and storing them in an xsltStylePreComp structure associated to the _private field of the node.

    -

    A couple of notable exceptions to this are XSLT template nodes (more on -this later) and attribute value templates. If they are actually templates, the -value cannot be computed at compilation time. (Some preprocessing could be done -like isolation and preparsing of the XPath subexpressions but it's not done, -yet.)

    - -

    The xsltStylePreComp structure also allows storing of the precompiled form of -an XPath expression that can be associated to an XSLT element (more on this -later).

    - -

    The XSLT template compilation

    - +this later) and attribute value templates. If they are actually templates, +the value cannot be computed at compilation time. (Some preprocessing could +be done like isolation and preparsing of the XPath subexpressions but it's +not done, yet.)

    +

    The xsltStylePreComp structure also allows storing of the precompiled form +of an XPath expression that can be associated to an XSLT element (more on +this later).

    +

    The XSLT template compilation

    A proper handling of templates lookup is one of the keys of fast XSLT -processing. (Given a node in the source document this is the process of finding -which templates should be applied to this node.) Libxslt follows the hint -suggested in the 5.2 Patterns -section of the XSLT Recommendation, i.e. it doesn't evaluate it as an XPath -expression but tokenizes it and compiles it as a set of rules to be evaluated on -a candidate node. There usually is an indication of the node name in the last -step of this evaluation and this is used as a key check for the match. As a -result libxslt builds a relatively more complex set of structures for the -templates:

    - -

    - +processing. (Given a node in the source document this is the process of +finding which templates should be applied to this node.) Libxslt follows the +hint suggested in the 5.2 +Patterns section of the XSLT Recommendation, i.e. it doesn't evaluate it +as an XPath expression but tokenizes it and compiles it as a set of rules to +be evaluated on a candidate node. There usually is an indication of the node +name in the last step of this evaluation and this is used as a key check for +the match. As a result libxslt builds a relatively more complex set of +structures for the templates:

    +

    The templates related structure

    Let's describe a bit more closely what is built. First the xsltStylesheet structure holds a pointer to the template hash table. All the XSLT patterns -compiled in this stylesheet are indexed by the value of the the target element -(or attribute, pi ...) name, so when a element or an attribute "foo" needs to -be processed the lookup is done using the name as a key.

    - +compiled in this stylesheet are indexed by the value of the the target +element (or attribute, pi ...) name, so when a element or an attribute "foo" +needs to be processed the lookup is done using the name as a key.

    Each of the patterns is compiled into an xsltCompMatch structure. It holds -the set of rules based on the tokenization of the pattern stored in -reverse order (matching is easier this way). It also holds some information -about the previous matches used to speed up the process when one iterates over -a set of siblings. (This optimization may be defeated by trashing when running +the set of rules based on the tokenization of the pattern stored in reverse +order (matching is easier this way). It also holds some information about the +previous matches used to speed up the process when one iterates over a set of +siblings. (This optimization may be defeated by trashing when running threaded computation, it's unclear that this is a big deal in practice.) Predicate expressions are not compiled at this stage, they may be at run-time if needed, but in this case they are compiled as full XPath expressions (the use of some fixed predicate can probably be optimized, they are not yet).

    -

    The xsltCompMatch are then stored in the hash table, the clash list is -itself sorted by priority of the template to implement "naturally" the XSLT +itself sorted by priority of the template to implement "naturally" the XSLT priority rules.

    -

    Associated to the compiled pattern is the xsltTemplate itself containing -the information required for the processing of the pattern including, -of course, a pointer to the list of elements used for building the pattern +the information required for the processing of the pattern including, of +course, a pointer to the list of elements used for building the pattern result.

    -

    Last but not least a number of patterns do not fit in the hash table because they are not associated to a name, this is the case for patterns applying to the root, any element, any attributes, text nodes, pi nodes, keys etc. Those are stored independently in the stylesheet structure as separate linked lists of xsltCompMatch.

    - -

    The processing itself

    - -

    The processing is defined by the XSLT specification (the -basis of the algorithm is explained in the Introduction +

    The processing itself

    +

    The processing is defined by the XSLT specification (the basis of the +algorithm is explained in the Introduction section). Basically it works by taking the root of the input document and applying the following algorithm:

      -
    1. Finding the template applying to it. This is a lookup in the - template hash table, walking the hash list until the node satisfies all - the steps of the pattern, then checking the appropriate(s) global - templates to see if there isn't a higher priority rule to apply
    2. -
    3. If there is no template, apply the default rule (recurse on the +
    4. Finding the template applying to it. This is a lookup in the template + hash table, walking the hash list until the node satisfies all the steps + of the pattern, then checking the appropriate(s) global templates to see + if there isn't a higher priority rule to apply
    5. +
    6. If there is no template, apply the default rule (recurse on the children)
    7. -
    8. else walk the content list of the selected templates, for each of them: +
    9. else walk the content list of the selected templates, for each of them:
        -
      • if the node is in the XSLT namespace then the node has a _private +
      • if the node is in the XSLT namespace then the node has a _private field pointing to the preprocessed values, jump to the specific code
      • -
      • if the node is in an extension namespace, look up the associated +
      • if the node is in an extension namespace, look up the associated behavior
      • -
      • otherwise copy the node.
      • -
      -

      The closure is usually done through the XSLT +

    10. otherwise copy the node.
    11. + +

      The closure is usually done through the XSLT apply-templates construct recursing by applying the adequate template on the input node children or on the result of an associated XPath selection lookup.

      - +
    -

    Note that large parts of the input tree may not be processed by a given -stylesheet and that on the opposite some may be processed multiple times. (This -often is the case when a Table of Contents is built).

    - +stylesheet and that on the opposite some may be processed multiple times. +(This often is the case when a Table of Contents is built).

    The module transform.c is the one implementing most of this -logic. xsltApplyStylesheet() is the entry point, it allocates -an xsltTransformContext containing the following:

    +logic. xsltApplyStylesheet() is the entry point, it +allocates an xsltTransformContext containing the following:

      -
    • a pointer to the stylesheet being processed
    • -
    • a stack of templates
    • -
    • a stack of variables and parameters
    • -
    • an XPath context
    • -
    • the template mode
    • -
    • current document
    • -
    • current input node
    • -
    • current selected node list
    • -
    • the current insertion points in the output document
    • -
    • a couple of hash tables for extension elements and functions
    • +
    • a pointer to the stylesheet being processed
    • +
    • a stack of templates
    • +
    • a stack of variables and parameters
    • +
    • an XPath context
    • +
    • the template mode
    • +
    • current document
    • +
    • current input node
    • +
    • current selected node list
    • +
    • the current insertion points in the output document
    • +
    • a couple of hash tables for extension elements and functions
    -

    Then a new document gets allocated (HTML or XML depending on the type of output), the user parameters and global variables and parameters are evaluated. Then xsltProcessOneNode() which implements the @@ -310,19 +294,15 @@ evaluated. Then xsltProcessOneNode() which implements the implemented by calling xsltGetTemplate(), step 2/ is implemented by xsltDefaultProcessOneNode() and step 3/ is implemented by xsltApplyOneTemplate().

    - -

    XPath expression compilation

    - -

    The XPath support is actually implemented in the libxml module (where it is -reused by the XPointer implementation). XPath is a relatively classic +

    XPath expression compilation

    +

    The XPath support is actually implemented in the libxml module (where it +is reused by the XPointer implementation). XPath is a relatively classic expression language. The only uncommon feature is that it is working on XML trees and hence has specific syntax and types to handle them.

    - -

    XPath expressions are compiled using xmlXPathCompile(). It -will take an expression string in input and generate a structure containing -the parsed expression tree, for example the expression:

    +

    XPath expressions are compiled using xmlXPathCompile(). +It will take an expression string in input and generate a structure +containing the parsed expression tree, for example the expression:

    /doc/chapter[title='Introduction']
    -

    will be compiled as

    Compiled Expression : 10 elements
       SORT
    @@ -337,183 +317,147 @@ the parsed expression tree, for example the expression:

    ELEM Object is a string : Introduction COLLECT 'child' 'name' 'node' title NODE
    -

    This can be tested using the testXPath command (in the libxml codebase) using the --tree option.

    - -

    Again, the KISS approach is used. No optimization is done. This could be an -interesting thing to add. Michael +

    Again, the KISS approach is used. No optimization is done. This could be +an interesting thing to add. Michael Kay describes a lot of possible and interesting optimizations done in Saxon which would be possible at this level. I'm unsure they would provide much gain since the expressions tends to be relatively simple in general and stylesheets are still hand generated. Optimizations at the interpretation sounds likely to be more efficient.

    - -

    XPath interpretation

    - +

    XPath interpretation

    The interpreter is implemented by xmlXPathCompiledEval() which is the front-end to xmlXPathCompOpEval() the function implementing the evaluation of the expression tree. This evaluation follows the KISS approach again. It's recursive and calls xmlXPathNodeCollectAndTest() to collect nodes set when evaluating a COLLECT node.

    - -

    An evaluation is done within the framework of an XPath context stored in an -xmlXPathContext structure, in the framework of a +

    An evaluation is done within the framework of an XPath context stored in +an xmlXPathContext structure, in the framework of a transformation the context is maintained within the XSLT context. Its content follows the requirements from the XPath specification:

      -
    • the current document
    • -
    • the current node
    • -
    • a hash table of defined variables (but not used by XSLT)
    • -
    • a hash table of defined functions
    • -
    • the proximity position (the place of the node in the current node +
    • the current document
    • +
    • the current node
    • +
    • a hash table of defined variables (but not used by XSLT)
    • +
    • a hash table of defined functions
    • +
    • the proximity position (the place of the node in the current node list)
    • -
    • the context size (the size of the current node list)
    • -
    • the array of namespace declarations in scope (there also is a namespace +
    • the context size (the size of the current node list)
    • +
    • the array of namespace declarations in scope (there also is a namespace hash table but it is not used in the XSLT transformation).
    -

    For the purpose of XSLT an extra pointer has been added -allowing to retrieve the XSLT transformation context. When an XPath evaluation -is about to be performed, an XPath parser context is allocated containing and -XPath object stack (this is actually an XPath evaluation context, this is a -remain of the time where there was no separate parsing and evaluation phase in -the XPath implementation). Here is an overview of the set of contexts -associated to an XPath evaluation within an XSLT transformation:

    - -

    - -

    Clearly this is a bit too complex and confusing and should be refactored at -the next set of binary incompatible releases of libxml. For example the +allowing to retrieve the XSLT transformation context. When an XPath +evaluation is about to be performed, an XPath parser context is allocated +containing and XPath object stack (this is actually an XPath evaluation +context, this is a remain of the time where there was no separate parsing and +evaluation phase in the XPath implementation). Here is an overview of the set +of contexts associated to an XPath evaluation within an XSLT +transformation:

    +

    The set of contexts associated

    +

    Clearly this is a bit too complex and confusing and should be refactored +at the next set of binary incompatible releases of libxml. For example the xmlXPathCtxt has a lot of unused parts and should probably be merged with xmlXPathParserCtxt.

    - -

    Description of XPath Objects

    - +

    Description of XPath Objects

    An XPath expression manipulates XPath objects. XPath defines the default types boolean, numbers, strings and node sets. XSLT adds the result tree fragment type which is basically an unmodifiable node set.

    -

    Implementation-wise, libxml follows again a KISS approach, the xmlXPathObject is a structure containing a type description and the various possibilities. (Using an enum could have gained some bytes.) In the case of node sets (or result tree fragments), it points to a separate xmlNodeSet object which contains the list of pointers to the document nodes:

    - -

    - +

    An Node set object pointing to

    The XPath API (and its 'internal' part) includes a number of functions to create, copy, compare, convert or free XPath objects.

    - -

    XPath functions

    - +

    XPath functions

    All the XPath functions available to the interpreter are registered in the function hash table linked from the XPath context. They all share the same signature:

    void xmlXPathFunc (xmlXPathParserContextPtr ctxt, int nargs);
    -

    The first argument is the XPath interpretation context, holding the -interpretation stack. The second argument defines the number of objects passed -on the stack for the function to consume (last argument is on top of the -stack).

    - +interpretation stack. The second argument defines the number of objects +passed on the stack for the function to consume (last argument is on top of +the stack).

    Basically an XPath function does the following:

      -
    • check nargs for proper handling of errors or functions with - variable numbers of parameters
    • -
    • pop the parameters from the stack using obj = - valuePop(ctxt);
    • -
    • do the function specific computation
    • -
    • push the result parameter on the stack using valuePush(ctxt, - res);
    • -
    • free up the input parameters with - xmlXPathFreeObject(obj);
    • -
    • return
    • +
    • check nargs for proper handling of errors or functions + with variable numbers of parameters
    • +
    • pop the parameters from the stack using obj = + valuePop(ctxt); +
    • +
    • do the function specific computation
    • +
    • push the result parameter on the stack using valuePush(ctxt, + res); +
    • +
    • free up the input parameters with + xmlXPathFreeObject(obj); +
    • +
    • return
    -

    Sometime the work can be done directly by modifying in-situ the top object on the stack ctxt->value.

    - -

    The XSLT variables stack frame

    - +

    The XSLT variables stack frame

    Not to be confused with XPath object stack, this stack holds the XSLT variables and parameters as they are defined through the recursive calls of call-template, apply-templates and default templates. This is used to define the scope of variables being called.

    - -

    This part seems to be the most urgent attention right now, first it is done -in a very inefficient way since the location of the variables and +

    This part seems to be the most urgent attention right now, first it is +done in a very inefficient way since the location of the variables and parameters within the stylesheet tree is still done at run time (it really should be done statically at compile time), and I am still unsure that my understanding of the template variables and parameter scope is actually right.

    - -

    This part of the documentation is still to be written once this part of the -code will be stable. TODO

    - -

    Extension support

    - +

    This part of the documentation is still to be written once this part of +the code will be stable. TODO +

    +

    Extension support

    There is a separate document explaining how the -extension support works.

    - -

    Further reading

    - -

    Michael Kay wrote a +extension support works.

    +

    Further reading

    +

    Michael Kay wrote a really interesting article on Saxon internals and the work he did on performance issues. I wishes I had read it before starting libxslt design (I would probably have avoided a few mistakes and progressed faster). A lot of the ideas in his papers should be implemented or at least tried in libxslt.

    - -

    The libxml documentation, especially the I/O interfaces and the memory management.

    - -

    TODOs

    - +

    The libxml documentation, especially the I/O interfaces and the memory management.

    +

    TODOs

    redesign the XSLT stack frame handling. Far too much work is done at -execution time. Similarly for the attribute value templates handling, at least -the embedded subexpressions ought to be precompiled.

    - +execution time. Similarly for the attribute value templates handling, at +least the embedded subexpressions ought to be precompiled.

    Allow output to be saved to a SAX like output (this notion of SAX like API for output should be added directly to libxml).

    -

    Implement and test some of the optimization explained by Michael Kay especially:

      -
    • static slot allocation on the stack frame
    • -
    • specific boolean interpretation of an XPath expression
    • -
    • some of the sorting optimization
    • -
    • Lazy evaluation of location path. (this may require more changes but +
    • static slot allocation on the stack frame
    • +
    • specific boolean interpretation of an XPath expression
    • +
    • some of the sorting optimization
    • +
    • Lazy evaluation of location path. (this may require more changes but sounds really interesting. XT does this too.)
    • -
    • Optimization of an expression tree (This could be done as a completely +
    • Optimization of an expression tree (This could be done as a completely independent module.)
    - -

    -Error reporting, there is a lot of case where the XSLT specification specify -that a given construct is an error are not checked adequately by libxslt. -Basically one should do a complete pass on the XSLT spec again and add all -tests to the stylesheet compilation. Using the DTD provided in the appendix and -making direct checks using the libxml validation API sounds a good idea too -(though one should take care of not raising errors for elements/attributes in -different namespaces). - +

    +

    Error reporting, there is a lot of case where the XSLT specification +specify that a given construct is an error are not checked adequately by +libxslt. Basically one should do a complete pass on the XSLT spec again and +add all tests to the stylesheet compilation. Using the DTD provided in the +appendix and making direct checks using the libxml validation API sounds a +good idea too (though one should take care of not raising errors for +elements/attributes in different namespaces).

    Double check all the places where the stylesheet compiled form might be modified at run time (extra removal of blanks nodes, hint on the xsltCompMatch).

    - -

    - +

    Daniel Veillard

    - -

    $Id$

    +
    diff --git a/doc/intro.html b/doc/intro.html index 773d33c3..df876994 100644 --- a/doc/intro.html +++ b/doc/intro.html @@ -36,6 +36,8 @@ A:link, A:visited, A:active { text-decoration: underline }
  • News
  • The xsltproc tool
  • The programming API
  • +
  • Library internals
  • +
  • Writing extensions
  • Contributions
  • flat page, stylesheet diff --git a/doc/news.html b/doc/news.html index 3cd66cfb..1856c09f 100644 --- a/doc/news.html +++ b/doc/news.html @@ -36,6 +36,8 @@ A:link, A:visited, A:active { text-decoration: underline }
  • News
  • The xsltproc tool
  • The programming API
  • +
  • Library internals
  • +
  • Writing extensions
  • Contributions
  • flat page, stylesheet diff --git a/doc/site.xsl b/doc/site.xsl index 18ae0bcc..2653cfd1 100644 --- a/doc/site.xsl +++ b/doc/site.xsl @@ -7,37 +7,43 @@ - + intro.html - + docs.html - + bugs.html - + help.html - + help.html - + downloads.html - + news.html - + contribs.html - + xsltproc2.html - + API.html - + + extensions.html + + + internals.html + + unknown.html diff --git a/doc/xslt.html b/doc/xslt.html index ca95d1c9..52c4b81d 100644 --- a/doc/xslt.html +++ b/doc/xslt.html @@ -12,20 +12,21 @@

    libxslt

    -

    Libxslt is the XSLT C library developped for the Gnome project. XSLT -itself is a an XML language to define transformation for XML. Libxslt is -based on libxml2 the XML C library developped for the Gnome project.

    +

    Libxslt is the XSLT C library +developped for the Gnome project. XSLT itself is a an XML language to define +transformation for XML. Libxslt is based on libxml2 the XML C library developped for the +Gnome project. It also implements most of the EXSLT set of extensions +functions and some of Saxon's evaluate and expressions extensions.

    People can either embed the library in their application or use xsltproc -the command line processing tool.

    +the command line processing tool. This library is free software and can be +reused in commercial applications (see the intro)

    External documents:

    @@ -482,6 +483,913 @@ processing needs and environment for example if reading/saving from/to memory, or if you want to apply XInclude processing to the stylesheet or input documents.

    +

    Library internals

    + +

    Table of contents

    + + +

    Introduction

    + +

    This document describes the processing of libxslt, the XSLT C library developed for the Gnome project.

    + +

    Note: this documentation is by definition incomplete and I am not good at +spelling, grammar, so patches and suggestions are really welcome.

    + +

    Basics

    + +

    XSLT is a transformation language. It takes an input document and a +stylesheet document and generates an output document:

    + +

    + +

    Libxslt is written in C. It relies on libxml, the XML C library for Gnome, for +the following operations:

    +
      +
    • parsing files
    • +
    • building the in-memory DOM structure associated with the documents + handled
    • +
    • the XPath implementation
    • +
    • serializing back the result document to XML and HTML. (Text is handled + directly.)
    • +
    + +

    Keep it simple stupid

    + +

    Libxslt is not very specialized. It is built under the assumption that all +nodes from the source and output document can fit in the virtual memory of +the system. There is a big trade-off there. It is fine for reasonably sized +documents but may not be suitable for large sets of data. The gain is that it +can be used in a relatively versatile way. The input or output may never be +serialized, but the size of documents it can handle are limited by the size +of the memory available.

    + +

    More specialized memory handling approaches are possible, like building +the input tree from a serialization progressively as it is consumed, +factoring repetitive patterns, or even on-the-fly generation of the output as +the input is parsed but it is possible only for a limited subset of the +stylesheets. In general the implementation of libxslt follows the following +pattern:

    +
      +
    • KISS (keep it simple stupid)
    • +
    • when there is a clear bottleneck optimize on top of this simple + framework and refine only as much as is needed to reach the expected + result
    • +
    + +

    The result is not that bad, clearly one can do a better job but more +specialized too. Most optimization like building the tree on-demand would +need serious changes to the libxml XPath framework. An easy step would be to +serialize the output directly (or call a set of SAX-like output handler to +keep this a flexible interface) and hence avoid the memory consumption of the +result.

    + +

    The libxml nodes

    + +

    DOM-like trees, as used and generated by libxml and libxslt, are +relatively complex. Most node types follow the given structure except a few +variations depending on the node type:

    + +

    description of a libxml node

    + +

    Nodes carry a name and the node type +indicates the kind of node it represents, the most common ones are:

    +
      +
    • document nodes
    • +
    • element nodes
    • +
    • text nodes
    • +
    + +

    For the XSLT processing, entity nodes should not be generated (i.e. they +should be replaced by their content). Most nodes also contains the following +"navigation" informations:

    +
      +
    • the containing document
    • +
    • the parent node
    • +
    • the first children node
    • +
    • the last children node
    • +
    • the previous sibling
    • +
    • the following sibling (next)
    • +
    + +

    Elements nodes carries the list of attributes in the properties, an +attribute itself holds the navigation pointers and the children list (the +attribute value is not represented as a simple string to allow usage of +entities references).

    + +

    The ns points to the namespace declaration for the +namespace associated to the node, nsDef is the linked list +of namespace declaration present on element nodes.

    + +

    Most nodes also carry an _private pointer which can be +used by the application to hold specific data on this node.

    + +

    The XSLT processing steps

    + +

    There are a few steps which are clearly decoupled at the interface +level:

    +
      +
    1. parse the stylesheet and generate a DOM tree
    2. +
    3. take the stylesheet tree and build a compiled version of it (the + compilation phase)
    4. +
    5. take the input and generate a DOM tree
    6. +
    7. process the stylesheet against the input tree and generate an output + tree
    8. +
    9. serialize the output tree
    10. +
    + +

    A few things should be noted here:

    +
      +
    • the steps 1/ 3/ and 5/ are optional
    • +
    • the stylesheet obtained at 2/ can be reused by multiple processing 4/ + (and this should also work in threaded programs)
    • +
    • the tree provided in 2/ should never be freed using xmlFreeDoc, but by + freeing the stylesheet.
    • +
    • the input tree 4/ is not modified except the _private field which may + be used for labelling keys if used by the stylesheet
    • +
    + +

    The XSLT stylesheet compilation

    + +

    This is the second step described. It takes a stylesheet tree, and +"compiles" it. This associates to each node a structure stored in the +_private field and containing information computed in the stylesheet:

    + +

    + +

    One xsltStylesheet structure is generated per document parsed for the +stylesheet. XSLT documents allow includes and imports of other documents, +imports are stored in the imports list (hence keeping the +tree hierarchy of includes which is very important for a proper XSLT +processing model) and includes are stored in the doclist +list. An imported stylesheet has a parent link to allow browsing of the +tree.

    + +

    The DOM tree associated to the document is stored in doc. +It is preprocessed to remove ignorable empty nodes and all the nodes in the +XSLT namespace are subject to precomputing. This usually consist of +extracting all the context information from the context tree (attributes, +namespaces, XPath expressions), and storing them in an xsltStylePreComp +structure associated to the _private field of the node.

    + +

    A couple of notable exceptions to this are XSLT template nodes (more on +this later) and attribute value templates. If they are actually templates, +the value cannot be computed at compilation time. (Some preprocessing could +be done like isolation and preparsing of the XPath subexpressions but it's +not done, yet.)

    + +

    The xsltStylePreComp structure also allows storing of the precompiled form +of an XPath expression that can be associated to an XSLT element (more on +this later).

    + +

    The XSLT template compilation

    + +

    A proper handling of templates lookup is one of the keys of fast XSLT +processing. (Given a node in the source document this is the process of +finding which templates should be applied to this node.) Libxslt follows the +hint suggested in the 5.2 +Patterns section of the XSLT Recommendation, i.e. it doesn't evaluate it +as an XPath expression but tokenizes it and compiles it as a set of rules to +be evaluated on a candidate node. There usually is an indication of the node +name in the last step of this evaluation and this is used as a key check for +the match. As a result libxslt builds a relatively more complex set of +structures for the templates:

    + +

    + +

    Let's describe a bit more closely what is built. First the xsltStylesheet +structure holds a pointer to the template hash table. All the XSLT patterns +compiled in this stylesheet are indexed by the value of the the target +element (or attribute, pi ...) name, so when a element or an attribute "foo" +needs to be processed the lookup is done using the name as a key.

    + +

    Each of the patterns is compiled into an xsltCompMatch structure. It holds +the set of rules based on the tokenization of the pattern stored in reverse +order (matching is easier this way). It also holds some information about the +previous matches used to speed up the process when one iterates over a set of +siblings. (This optimization may be defeated by trashing when running +threaded computation, it's unclear that this is a big deal in practice.) +Predicate expressions are not compiled at this stage, they may be at run-time +if needed, but in this case they are compiled as full XPath expressions (the +use of some fixed predicate can probably be optimized, they are not yet).

    + +

    The xsltCompMatch are then stored in the hash table, the clash list is +itself sorted by priority of the template to implement "naturally" the XSLT +priority rules.

    + +

    Associated to the compiled pattern is the xsltTemplate itself containing +the information required for the processing of the pattern including, of +course, a pointer to the list of elements used for building the pattern +result.

    + +

    Last but not least a number of patterns do not fit in the hash table +because they are not associated to a name, this is the case for patterns +applying to the root, any element, any attributes, text nodes, pi nodes, keys +etc. Those are stored independently in the stylesheet structure as separate +linked lists of xsltCompMatch.

    + +

    The processing itself

    + +

    The processing is defined by the XSLT specification (the basis of the +algorithm is explained in the Introduction +section). Basically it works by taking the root of the input document and +applying the following algorithm:

    +
      +
    1. Finding the template applying to it. This is a lookup in the template + hash table, walking the hash list until the node satisfies all the steps + of the pattern, then checking the appropriate(s) global templates to see + if there isn't a higher priority rule to apply
    2. +
    3. If there is no template, apply the default rule (recurse on the + children)
    4. +
    5. else walk the content list of the selected templates, for each of them: +
        +
      • if the node is in the XSLT namespace then the node has a _private + field pointing to the preprocessed values, jump to the specific + code
      • +
      • if the node is in an extension namespace, look up the associated + behavior
      • +
      • otherwise copy the node.
      • +
      +

      The closure is usually done through the XSLT + apply-templates construct recursing by applying the + adequate template on the input node children or on the result of an + associated XPath selection lookup.

      +
    6. +
    + +

    Note that large parts of the input tree may not be processed by a given +stylesheet and that on the opposite some may be processed multiple times. +(This often is the case when a Table of Contents is built).

    + +

    The module transform.c is the one implementing most of this +logic. xsltApplyStylesheet() is the entry point, it +allocates an xsltTransformContext containing the following:

    +
      +
    • a pointer to the stylesheet being processed
    • +
    • a stack of templates
    • +
    • a stack of variables and parameters
    • +
    • an XPath context
    • +
    • the template mode
    • +
    • current document
    • +
    • current input node
    • +
    • current selected node list
    • +
    • the current insertion points in the output document
    • +
    • a couple of hash tables for extension elements and functions
    • +
    + +

    Then a new document gets allocated (HTML or XML depending on the type of +output), the user parameters and global variables and parameters are +evaluated. Then xsltProcessOneNode() which implements the +1-2-3 algorithm is called on the root element of the input. Step 1/ is +implemented by calling xsltGetTemplate(), step 2/ is +implemented by xsltDefaultProcessOneNode() and step 3/ is +implemented by xsltApplyOneTemplate().

    + +

    XPath expression compilation

    + +

    The XPath support is actually implemented in the libxml module (where it +is reused by the XPointer implementation). XPath is a relatively classic +expression language. The only uncommon feature is that it is working on XML +trees and hence has specific syntax and types to handle them.

    + +

    XPath expressions are compiled using xmlXPathCompile(). +It will take an expression string in input and generate a structure +containing the parsed expression tree, for example the expression:

    +
    /doc/chapter[title='Introduction']
    + +

    will be compiled as

    +
    Compiled Expression : 10 elements
    +  SORT
    +    COLLECT  'child' 'name' 'node' chapter
    +      COLLECT  'child' 'name' 'node' doc
    +        ROOT
    +      PREDICATE
    +        SORT
    +          EQUAL =
    +            COLLECT  'child' 'name' 'node' title
    +              NODE
    +            ELEM Object is a string : Introduction
    +              COLLECT  'child' 'name' 'node' title
    +                NODE
    + +

    This can be tested using the testXPath command (in the +libxml codebase) using the --tree option.

    + +

    Again, the KISS approach is used. No optimization is done. This could be +an interesting thing to add. Michael +Kay describes a lot of possible and interesting optimizations done in +Saxon which would be possible at this level. I'm unsure they would provide +much gain since the expressions tends to be relatively simple in general and +stylesheets are still hand generated. Optimizations at the interpretation +sounds likely to be more efficient.

    + +

    XPath interpretation

    + +

    The interpreter is implemented by xmlXPathCompiledEval() +which is the front-end to xmlXPathCompOpEval() the function +implementing the evaluation of the expression tree. This evaluation follows +the KISS approach again. It's recursive and calls +xmlXPathNodeCollectAndTest() to collect nodes set when +evaluating a COLLECT node.

    + +

    An evaluation is done within the framework of an XPath context stored in +an xmlXPathContext structure, in the framework of a +transformation the context is maintained within the XSLT context. Its content +follows the requirements from the XPath specification:

    +
      +
    • the current document
    • +
    • the current node
    • +
    • a hash table of defined variables (but not used by XSLT)
    • +
    • a hash table of defined functions
    • +
    • the proximity position (the place of the node in the current node + list)
    • +
    • the context size (the size of the current node list)
    • +
    • the array of namespace declarations in scope (there also is a namespace + hash table but it is not used in the XSLT transformation).
    • +
    + +

    For the purpose of XSLT an extra pointer has been added +allowing to retrieve the XSLT transformation context. When an XPath +evaluation is about to be performed, an XPath parser context is allocated +containing and XPath object stack (this is actually an XPath evaluation +context, this is a remain of the time where there was no separate parsing and +evaluation phase in the XPath implementation). Here is an overview of the set +of contexts associated to an XPath evaluation within an XSLT +transformation:

    + +

    + +

    Clearly this is a bit too complex and confusing and should be refactored +at the next set of binary incompatible releases of libxml. For example the +xmlXPathCtxt has a lot of unused parts and should probably be merged with +xmlXPathParserCtxt.

    + +

    Description of XPath Objects

    + +

    An XPath expression manipulates XPath objects. XPath defines the default +types boolean, numbers, strings and node sets. XSLT adds the result tree +fragment type which is basically an unmodifiable node set.

    + +

    Implementation-wise, libxml follows again a KISS approach, the +xmlXPathObject is a structure containing a type description and the various +possibilities. (Using an enum could have gained some bytes.) In the case of +node sets (or result tree fragments), it points to a separate xmlNodeSet +object which contains the list of pointers to the document nodes:

    + +

    + +

    The XPath API (and +its 'internal' +part) includes a number of functions to create, copy, compare, convert or +free XPath objects.

    + +

    XPath functions

    + +

    All the XPath functions available to the interpreter are registered in the +function hash table linked from the XPath context. They all share the same +signature:

    +
    void xmlXPathFunc (xmlXPathParserContextPtr ctxt, int nargs);
    + +

    The first argument is the XPath interpretation context, holding the +interpretation stack. The second argument defines the number of objects +passed on the stack for the function to consume (last argument is on top of +the stack).

    + +

    Basically an XPath function does the following:

    +
      +
    • check nargs for proper handling of errors or functions + with variable numbers of parameters
    • +
    • pop the parameters from the stack using obj = + valuePop(ctxt);
    • +
    • do the function specific computation
    • +
    • push the result parameter on the stack using valuePush(ctxt, + res);
    • +
    • free up the input parameters with + xmlXPathFreeObject(obj);
    • +
    • return
    • +
    + +

    Sometime the work can be done directly by modifying in-situ the top object +on the stack ctxt->value.

    + +

    The XSLT variables stack frame

    + +

    Not to be confused with XPath object stack, this stack holds the XSLT +variables and parameters as they are defined through the recursive calls of +call-template, apply-templates and default templates. This is used to define +the scope of variables being called.

    + +

    This part seems to be the most urgent attention right now, first it is +done in a very inefficient way since the location of the variables and +parameters within the stylesheet tree is still done at run time (it really +should be done statically at compile time), and I am still unsure that my +understanding of the template variables and parameter scope is actually +right.

    + +

    This part of the documentation is still to be written once this part of +the code will be stable. TODO

    + +

    Extension support

    + +

    There is a separate document explaining how the +extension support works.

    + +

    Further reading

    + +

    Michael Kay wrote a +really interesting article on Saxon internals and the work he did on +performance issues. I wishes I had read it before starting libxslt design (I +would probably have avoided a few mistakes and progressed faster). A lot of +the ideas in his papers should be implemented or at least tried in +libxslt.

    + +

    The libxml documentation, especially the I/O interfaces and the memory management.

    + +

    TODOs

    + +

    redesign the XSLT stack frame handling. Far too much work is done at +execution time. Similarly for the attribute value templates handling, at +least the embedded subexpressions ought to be precompiled.

    + +

    Allow output to be saved to a SAX like output (this notion of SAX like API +for output should be added directly to libxml).

    + +

    Implement and test some of the optimization explained by Michael Kay +especially:

    +
      +
    • static slot allocation on the stack frame
    • +
    • specific boolean interpretation of an XPath expression
    • +
    • some of the sorting optimization
    • +
    • Lazy evaluation of location path. (this may require more changes but + sounds really interesting. XT does this too.)
    • +
    • Optimization of an expression tree (This could be done as a completely + independent module.)
    • +
    + +

    + +

    Error reporting, there is a lot of case where the XSLT specification +specify that a given construct is an error are not checked adequately by +libxslt. Basically one should do a complete pass on the XSLT spec again and +add all tests to the stylesheet compilation. Using the DTD provided in the +appendix and making direct checks using the libxml validation API sounds a +good idea too (though one should take care of not raising errors for +elements/attributes in different namespaces).

    + +

    Double check all the places where the stylesheet compiled form might be +modified at run time (extra removal of blanks nodes, hint on the +xsltCompMatch).

    + +

    + +

    Writing extensions

    + +

    Table of content

    + + +

    Introduction

    + +

    This document describes the work needed to write extensions to the +standard XSLT library for use with libxslt, the XSLT C library developped for the Gnome project.

    + +

    Before starting reading this document it is highly recommended to get +familiar with the libxslt internals.

    + +

    Note: this documentation is by definition incomplete and I am not good at +spelling, grammar, so patches and suggestions are really welcome.

    + +

    Basics

    + +

    The XSLT specification provides +two ways to extend an XSLT engine:

    + + +

    In both cases the extensions need to be associated to a new namespace, +i.e. an URI used as the name for the extension's namespace (there is no need +to have a resource there for this to work).

    + +

    libxslt provides a few extensions itself, either in libxslt namespace +"http://xmlsoft.org/XSLT/" or in other namespace for well known extensions +provided by other XSLT processors like Saxon, Xalan or XT.

    + +

    Extension modules

    + +

    Since extensions are bound to a namespace name, usually sets of extensions +coming from a given source are using the same namespace name defining in +practice a group of extensions providing elements, functions or both. From +libxslt point of view those are considered as an "extension module", and most +of the APIs work at a module point of view.

    + +

    Registration of new functions or elements are bound to the activation of +the module, this is currently done by declaring the namespace as an extension +by using the attribute extension-element-prefixes on the +xsl:stylesheet +element.

    + +

    And extension module is defined by 3 objects:

    +
      +
    • the namespace name associated
    • +
    • an initialization function
    • +
    • a shutdown function
    • +
    + +

    Registering a module

    + +

    Currently a libxslt module has to be compiled within the application using +libxslt, there is no code to load dynamically shared libraries associated to +namespace (this may be added but is likely to become a portability +nightmare).

    + +

    So the current way to register a module is to link the code implementing +it with the application and to call a registration function:

    +
    int xsltRegisterExtModule(const xmlChar *URI,
    +                          xsltExtInitFunction initFunc,
    +                          xsltExtShutdownFunction shutdownFunc);
    + +

    The associated header is read by:

    +
    #include<libxslt/extensions.h>
    + +

    which also defines the type for the initialization and shutdown +functions

    + +

    Loading a module

    + +

    Once the module URI has been registered and if the XSLT processor detects +that a given stylesheet needs the functionalities of an extended module, this +one is initialized.

    + +

    The xsltExtInitFunction type defines the interface for an initialization +function:

    +
    /**
    + * xsltExtInitFunction:
    + * @ctxt:  an XSLT transformation context
    + * @URI:  the namespace URI for the extension
    + *
    + * A function called at initialization time of an XSLT
    + * extension module
    + *
    + * Returns a pointer to the module specific data for this
    + * transformation
    + */
    +typedef void *(*xsltExtInitFunction)(xsltTransformContextPtr ctxt,
    +                                     const xmlChar *URI);
    + +

    There are 3 things to notice:

    +
      +
    • the function gets passed the namespace name URI as an argument, this + allow a single function to provide the initialization for multiple + logical modules
    • +
    • it also gets passed a transformation context, the initialization is + done at run time before any processing occurs on the stylesheet but it + will be invoked separately each time for each transformation
    • +
    • it returns a pointer, this can be used to store module specific + informations which can be retrieved later when a function or an element + from the extension are used, an obvious example is a connection to a + database which should be kept and reused along the transformation. NULL + is a perfectly valid return, there is no way to indicate a failure at + this level
    • +
    + +

    What this function is expected to do is:

    +
      +
    • prepare the context for this module (like opening the database + connection)
    • +
    • register the extensions specific to this module
    • +
    + +

    Registering an extension function

    + +

    There is a single call to do this registration:

    +
    int xsltRegisterExtFunction(xsltTransformContextPtr ctxt,
    +                            const xmlChar *name,
    +                            const xmlChar *URI,
    +                            xmlXPathEvalFunc function);
    + +

    The registration is bound to a single transformation instance referred by +ctxt, name is the UTF8 encoded name for the NCName of the function, and URI +is the namespace name for the extension (no checking is done, a module could +register functions or elements from a different namespace, but it is not +recommended).

    + +

    Implementing an extension function

    + +

    The implementation of the function must have the signature of a libxml +XPath function:

    +
    /**
    + * xmlXPathEvalFunc:
    + * @ctxt: an XPath parser context
    + * @nargs: the number of arguments passed to the function
    + *
    + * an XPath evaluation function, the parameters are on the
    + * XPath context stack
    + */
    +
    +typedef void (*xmlXPathEvalFunc)(xmlXPathParserContextPtr ctxt,
    +                                 int nargs);
    + +

    The context passed to an XPath function is not an XSLT context but an XPath context. However it is possible to +find one from the other:

    +
      +
    • The function xsltXPathGetTransformContext provide this lookup facility: +
      xsltTransformContextPtr
      +         xsltXPathGetTransformContext
      +                          (xmlXPathParserContextPtr ctxt);
      +
    • +
    • The xmlXPathContextPtr associated to an + xsltTransformContext is stored in the xpathCtxt + field.
    • +
    + +

    The first thing an extension function may want to do is to check the +arguments passed on the stack, the nargs will precise how many +of them were provided on the XPath expression. The macros valuePop will +extract them from the XPath stack:

    +
    #include <libxml/xpath.h>
    +#include <libxml/xpathInternals.h>
    +
    +xmlXPathObjectPtr obj = valuePop(ctxt); 
    + +

    Note that ctxt is the XPath context not the XSLT one. It is +then possible to examine the content of the value. Check the description of XPath objects if +necessary. The following is a common sequcnce checking whether the argument +passed is a string and converting it using the built-in XPath +string() function if this is not the case:

    +
    if (obj->type != XPATH_STRING) {
    +    valuePush(ctxt, obj);
    +    xmlXPathStringFunction(ctxt, 1);
    +    obj = valuePop(ctxt);
    +}
    + +

    Most common XPath functions are available directly at the C level and are +exported either in <libxml/xpath.h> or in +<libxml/xpathInternals.h>.

    + +

    The extension function may also need to retrieve the data associated to +this module instance (the database connection in the previous example) this +can be done using the xsltGetExtData:

    +
    void * xsltGetExtData(xsltTransformContextPtr ctxt,
    +                      const xmlChar *URI);
    + +

    again the URI to be provided is the one used which was used when +registering the module.

    + +

    Once the function finishes, don't forget to:

    +
      +
    • push the return value on the stack using valuePush(ctxt, + obj)
    • +
    • deallocate the parameters passed to the function using + xmlXPathFreeObject(obj)
    • +
    + +

    Examples for extension functions

    + +

    The module libxslt/functions.c containsthe sources of the XSLT built-in +functions, including document(), key(), generate-id(), etc. as well as a full +example module at the end. Here is the test function implementation for the +libxslt:test function:

    +
    /**
    + * xsltExtFunctionTest:
    + * @ctxt:  the XPath Parser context
    + * @nargs:  the number of arguments
    + *
    + * function libxslt:test() for testing the extensions support.
    + */
    +static void
    +xsltExtFunctionTest(xmlXPathParserContextPtr ctxt, int nargs)
    +{
    +    xsltTransformContextPtr tctxt;
    +    void *data;
    +
    +    tctxt = xsltXPathGetTransformContext(ctxt);
    +    if (tctxt == NULL) {
    +        xsltGenericError(xsltGenericErrorContext,
    +            "xsltExtFunctionTest: failed to get the transformation context\n");
    +        return;
    +    }
    +    data = xsltGetExtData(tctxt, (const xmlChar *) XSLT_DEFAULT_URL);
    +    if (data == NULL) {
    +        xsltGenericError(xsltGenericErrorContext,
    +            "xsltExtFunctionTest: failed to get module data\n");
    +        return;
    +    }
    +#ifdef WITH_XSLT_DEBUG_FUNCTION
    +    xsltGenericDebug(xsltGenericDebugContext,
    +                     "libxslt:test() called with %d args\n", nargs);
    +#endif
    +}
    + +

    Registering an extension function

    + +

    There is a single call to do this registration:

    +
    int xsltRegisterExtElement(xsltTransformContextPtr ctxt,
    +                           const xmlChar *name,
    +                           const xmlChar *URI,
    +                           xsltTransformFunction function);
    + +

    It is similar to the mechanism used to register an extension function, +except that the signature of an extension element implementation is +different.

    + +

    The registration is bound to a single transformation instance referred by +ctxt, name is the UTF8 encoded name for the NCName of the element, and URI is +the namespace name for the extension (no checking is done, a module could +register elements for a different namespace, but it is not recommended).

    + +

    Implementing an extension element

    + +

    The implementation of the element must have the signature of an XSLT +transformation function:

    +
    /** 
    + * xsltTransformFunction: 
    + * @ctxt: the XSLT transformation context
    + * @node: the input node
    + * @inst: the stylesheet node 
    + * @comp: the compiled information from the stylesheet 
    + * 
    + * signature of the function associated to elements part of the
    + * stylesheet language like xsl:if or xsl:apply-templates.
    + */ 
    +typedef void (*xsltTransformFunction)
    +                          (xsltTransformContextPtr ctxt,
    +                           xmlNodePtr node,
    +                           xmlNodePtr inst,
    +                           xsltStylePreCompPtr comp);
    + +

    The first argument is the XSLT transformation context. The second and +third arguments are xmlNodePtr i.e. internal memory representation of XML nodes. They are +respectively node from the the input document being transformed +by the stylesheet and inst the extension element in the +stylesheet. The last argument is comp a pointer to a precompiled +representation of inst but usually for extension function this +value is NULL by default (it could be added and associated to +the instruction in inst->_private).

    + +

    The same functions are available from a function implementing an extension +element as in an extension function, including +xsltGetExtData().

    + +

    The goal of extension element being usually to enrich the generated +output, it is expected that they will grow the currently generated output +tree, this can be done by grabbing ctxt->insert which is the current +libxml node being generated (Note this can also be the intermediate value +tree being built for example to initialize a variable, the processing should +be similar). The functions for libxml tree manipulation from <libxml/tree.h> can +be employed to extend or modify the tree, but it is required to preserve the +insertion node and its ancestors since there is existing pointers to those +elements still in use in the XSLT template execution stack.

    + +

    Example for extension elements

    + +

    The module libxslt/transform.c containsthe sources of the XSLT built-in +elements, including xsl:element, xsl:attribute, xsl:if, etc. There is a small +but full example in functions.c providing the implementation for the +libxslt:test element, it will output a comment in the result tree:

    +
    /**
    + * xsltExtElementTest:
    + * @ctxt:  an XSLT processing context
    + * @node:  The current node
    + * @inst:  the instruction in the stylesheet
    + * @comp:  precomputed informations
    + *
    + * Process a libxslt:test node
    + */
    +static void
    +xsltExtElementTest(xsltTransformContextPtr ctxt, xmlNodePtr node,
    +                   xmlNodePtr inst,
    +                   xsltStylePreCompPtr comp)
    +{
    +    xmlNodePtr comment;
    +
    +    if (ctxt == NULL) {
    +        xsltGenericError(xsltGenericErrorContext,
    +                         "xsltExtElementTest: no transformation context\n");
    +        return;
    +    }
    +    if (node == NULL) {
    +        xsltGenericError(xsltGenericErrorContext,
    +                         "xsltExtElementTest: no current node\n");
    +        return;
    +    }
    +    if (inst == NULL) {
    +        xsltGenericError(xsltGenericErrorContext,
    +                         "xsltExtElementTest: no instruction\n");
    +        return;
    +    }
    +    if (ctxt->insert == NULL) {
    +        xsltGenericError(xsltGenericErrorContext,
    +                         "xsltExtElementTest: no insertion point\n");
    +        return;
    +    }
    +    comment =
    +        xmlNewComment((const xmlChar *)
    +                      "libxslt:test element test worked");
    +    xmlAddChild(ctxt->insert, comment);
    +}
    + +

    The shutdown of a module

    + +

    When the XSLT processor ends a transformation, the shutdown function (if +it exists) of all the modules initialized are called.The +xsltExtShutdownFunction type defines the interface for a shutdown +function:

    +
    /**
    + * xsltExtShutdownFunction:
    + * @ctxt:  an XSLT transformation context
    + * @URI:  the namespace URI for the extension
    + * @data:  the data associated to this module
    + *
    + * A function called at shutdown time of an XSLT extension module
    + */
    +typedef void (*xsltExtShutdownFunction) (xsltTransformContextPtr ctxt,
    +                                         const xmlChar *URI,
    +                                         void *data);
    + +

    this is really similar to a module initialization function except a third +argument is passed, it's the value that was returned by the initialization +function. This allow to deallocate resources from the module for example +close the connection to the database to keep the same example.

    + +

    Future work

    + +

    Well some of the pieces missing:

    +
      +
    • a way to load shared libraries to instanciate new modules
    • +
    • a better detection of extension function usage and their registration + without having to use the extension prefix which ought to be reserved to + element extensions.
    • +
    • more examples
    • +
    • implementations of the EXSLT common + extension libraries, Thomas Broyer nearly finished implementing them.
    • +
    + +

    +

    Contributions