1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-07-14 20:01:04 +03:00

Applied a spelling patch from Geert Kloosterman to xml.html, and regenerated

the web site, Daniel
This commit is contained in:
Daniel Veillard
2002-05-20 06:51:05 +00:00
parent 6d1ef17b17
commit 63d83142ff
19 changed files with 284 additions and 285 deletions

View File

@ -88,23 +88,23 @@ A:link, A:visited, A:active { text-decoration: underline }
<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> <td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
<p>Table of Content:</p> <p>Table of Content:</p>
<ul> <ul>
<li><a href="FAQ.html#Licence">Licence(s)</a></li> <li><a href="FAQ.html#License">License(s)</a></li>
<li><a href="FAQ.html#Installati">Installation</a></li> <li><a href="FAQ.html#Installati">Installation</a></li>
<li><a href="FAQ.html#Compilatio">Compilation</a></li> <li><a href="FAQ.html#Compilatio">Compilation</a></li>
<li><a href="FAQ.html#Developer">Developer corner</a></li> <li><a href="FAQ.html#Developer">Developer corner</a></li>
</ul> </ul>
<h3> <h3>
<a name="Licence">Licence</a>(s)</h3> <a name="License">License</a>(s)</h3>
<ol> <ol>
<li> <li>
<em>Licensing Terms for libxml</em> <em>Licensing Terms for libxml</em>
<p>libxml is released under the <a href="http://www.opensource.org/licenses/mit-license.html">MIT <p>libxml is released under the <a href="http://www.opensource.org/licenses/mit-license.html">MIT
Licence</a>, see the file Copyright in the distribution for the precise License</a>, see the file Copyright in the distribution for the precise
wording</p> wording</p>
</li> </li>
<li> <li>
<em>Can I embed libxml in a proprietary application ?</em> <em>Can I embed libxml in a proprietary application ?</em>
<p>Yes. The MIT Licence allows you to also keep proprietary the changes <p>Yes. The MIT License allows you to also keep proprietary the changes
you made to libxml, but it would be graceful to provide back bug fixes and you made to libxml, but it would be graceful to provide back bug fixes and
improvements as patches for possible incorporation in the main improvements as patches for possible incorporation in the main
development tree</p> development tree</p>
@ -119,7 +119,7 @@ A:link, A:visited, A:active { text-decoration: underline }
<em>Where can I get libxml</em> ? <em>Where can I get libxml</em> ?
<p>The original distribution comes from <a href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> or <a href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">gnome.org</a> <p>The original distribution comes from <a href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> or <a href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">gnome.org</a>
</p> </p>
<p>Most linux and Bsd distribution includes libxml, this is probably the <p>Most Linux and BSD distributions include libxml, this is probably the
safer way for end-users</p> safer way for end-users</p>
<p>David Doolin provides precompiled Windows versions at <a href="http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/%20%20%20%20%20%20%20%20%20">http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/</a> <p>David Doolin provides precompiled Windows versions at <a href="http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/%20%20%20%20%20%20%20%20%20">http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/</a>
</p> </p>
@ -150,8 +150,8 @@ A:link, A:visited, A:active { text-decoration: underline }
</li> </li>
<li> <li>
<em>I can't install the libxml(2) RPM package due to failed <em>I can't install the libxml(2) RPM package due to failed
dependancies</em> dependencies</em>
<p>The most generic solution is to refetch the latest src.rpm , and <p>The most generic solution is to re-fetch the latest src.rpm , and
rebuild it locally with</p> rebuild it locally with</p>
<p><code>rpm --rebuild libxml(2)-xxx.src.rpm</code></p> <p><code>rpm --rebuild libxml(2)-xxx.src.rpm</code></p>
<p>if everything goes well it will generate two binary rpm (one providing <p>if everything goes well it will generate two binary rpm (one providing
@ -188,7 +188,7 @@ A:link, A:visited, A:active { text-decoration: underline }
highly portable and available widely compression library</li> highly portable and available widely compression library</li>
<li>iconv: a powerful character encoding conversion library. It's <li>iconv: a powerful character encoding conversion library. It's
included by default on recent glibc libraries, so it doesn't need to included by default on recent glibc libraries, so it doesn't need to
be installed specifically on linux. It seems it's now <a href="http://www.opennc.org/onlinepubs/7908799/xsh/iconv.html">part be installed specifically on Linux. It seems it's now <a href="http://www.opennc.org/onlinepubs/7908799/xsh/iconv.html">part
of the official UNIX</a> specification. Here is one <a href="http://clisp.cons.org/~haible/packages-libiconv.html">implementation of the official UNIX</a> specification. Here is one <a href="http://clisp.cons.org/~haible/packages-libiconv.html">implementation
of the library</a> which source can be found <a href="ftp://ftp.ilog.fr/pub/Users/haible/gnu/">here</a>.</li> of the library</a> which source can be found <a href="ftp://ftp.ilog.fr/pub/Users/haible/gnu/">here</a>.</li>
</ul> </ul>
@ -248,7 +248,7 @@ A:link, A:visited, A:active { text-decoration: underline }
<p><em>I want to the get the content of the first node (node with the <p><em>I want to the get the content of the first node (node with the
CommFlag=&quot;0&quot;)</em></p> CommFlag=&quot;0&quot;)</em></p>
<p><em>so I did it as following;</em></p> <p><em>so I did it as following;</em></p>
<pre>xmlNodePtr pode; <pre>xmlNodePtr pnode;
pnode=pxmlDoc-&gt;children-&gt;children;</pre> pnode=pxmlDoc-&gt;children-&gt;children;</pre>
<p><em>but it does not work. If I change it to</em></p> <p><em>but it does not work. If I change it to</em></p>
<pre>pnode=pxmlDoc-&gt;children-&gt;children-&gt;next;</pre> <pre>pnode=pxmlDoc-&gt;children-&gt;children-&gt;next;</pre>
@ -257,7 +257,7 @@ pnode=pxmlDoc-&gt;children-&gt;children;</pre>
<p>In XML all characters in the content of the document are significant <p>In XML all characters in the content of the document are significant
<strong>including blanks and formatting line breaks</strong>.</p> <strong>including blanks and formatting line breaks</strong>.</p>
<p>The extra nodes you are wondering about are just that, text nodes with <p>The extra nodes you are wondering about are just that, text nodes with
the formatting spaces wich are part of the document but that people tend the formatting spaces which are part of the document but that people tend
to forget. There is a function <a href="http://xmlsoft.org/html/libxml-parser.html">xmlKeepBlanksDefault to forget. There is a function <a href="http://xmlsoft.org/html/libxml-parser.html">xmlKeepBlanksDefault
()</a> to remove those at parse time, but that's an heuristic, and its ()</a> to remove those at parse time, but that's an heuristic, and its
use should be limited to case where you are sure there is no use should be limited to case where you are sure there is no
@ -300,7 +300,7 @@ pnode=pxmlDoc-&gt;children-&gt;children;</pre>
generated doc</a> generated doc</a>
</li> </li>
<li>looks for examples of use for libxml function using the Gnome code <li>looks for examples of use for libxml function using the Gnome code
for example the following will query the full Gnome CVs base for the for example the following will query the full Gnome CVS base for the
use of the <strong>xmlAddChild()</strong> function: use of the <strong>xmlAddChild()</strong> function:
<p><a href="http://cvs.gnome.org/lxr/search?string=xmlAddChild">http://cvs.gnome.org/lxr/search?string=xmlAddChild</a></p> <p><a href="http://cvs.gnome.org/lxr/search?string=xmlAddChild">http://cvs.gnome.org/lxr/search?string=xmlAddChild</a></p>
<p>This may be slow, a large hardware donation to the gnome project <p>This may be slow, a large hardware donation to the gnome project
@ -318,7 +318,7 @@ pnode=pxmlDoc-&gt;children-&gt;children;</pre>
<p>libxml is written in pure C in order to allow easy reuse on a number <p>libxml is written in pure C in order to allow easy reuse on a number
of platforms, including embedded systems. I don't intend to convert to of platforms, including embedded systems. I don't intend to convert to
C++.</p> C++.</p>
<p>There is however a few C++ wrappers which may fullfill your needs:</p> <p>There is however a few C++ wrappers which may fulfill your needs:</p>
<ul> <ul>
<li>by Ari Johnson &lt;ari@btigate.com&gt;: <li>by Ari Johnson &lt;ari@btigate.com&gt;:
<p>Website: <a href="http://lusis.org/~ari/xml%2B%2B/">http://lusis.org/~ari/xml++/</a> <p>Website: <a href="http://lusis.org/~ari/xml%2B%2B/">http://lusis.org/~ari/xml++/</a>
@ -336,7 +336,7 @@ pnode=pxmlDoc-&gt;children-&gt;children;</pre>
<p>It is possible to validate documents which had not been validated at <p>It is possible to validate documents which had not been validated at
initial parsing time or documents who have been built from scratch using initial parsing time or documents who have been built from scratch using
the API. Use the <a href="http://xmlsoft.org/html/libxml-valid.html#XMLVALIDATEDTD">xmlValidateDtd()</a> the API. Use the <a href="http://xmlsoft.org/html/libxml-valid.html#XMLVALIDATEDTD">xmlValidateDtd()</a>
function. It is also possible to simply add a Dtd to an existing function. It is also possible to simply add a DTD to an existing
document:</p> document:</p>
<pre>xmlDocPtr doc; /* your existing document */ <pre>xmlDocPtr doc; /* your existing document */
xmlDtdPtr dtd = xmlParseDTD(NULL, filename_of_dtd); /* parse the DTD */ xmlDtdPtr dtd = xmlParseDTD(NULL, filename_of_dtd); /* parse the DTD */

View File

@ -110,7 +110,7 @@ to be closed</strong>. XML is pedantic about this. However, if a tag is empty
it ends with <code>/&gt;</code> rather than with <code>&gt;</code>. Note it ends with <code>/&gt;</code> rather than with <code>&gt;</code>. Note
that, for example, the image tag has no content (just an attribute) and is that, for example, the image tag has no content (just an attribute) and is
closed by ending the tag with <code>/&gt;</code>.</p> closed by ending the tag with <code>/&gt;</code>.</p>
<p>XML can be applied sucessfully to a wide range of uses, from long term <p>XML can be applied successfully to a wide range of uses, from long term
structured document maintenance (where it follows the steps of SGML) to structured document maintenance (where it follows the steps of SGML) to
simple data encoding mechanisms like configuration file formatting (glade), simple data encoding mechanisms like configuration file formatting (glade),
spreadsheets (gnumeric), or even shorter lived documents such as WebDAV where spreadsheets (gnumeric), or even shorter lived documents such as WebDAV where

View File

@ -105,13 +105,13 @@ posting</span></strong>:</p>
version</a>, and that the problem still shows up in those</li> version</a>, and that the problem still shows up in those</li>
<li>check the <a href="http://mail.gnome.org/archives/xml/">list <li>check the <a href="http://mail.gnome.org/archives/xml/">list
archives</a> to see if the problem was reported already, in this case archives</a> to see if the problem was reported already, in this case
there is probably a fix available, similary check the <a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">registered there is probably a fix available, similarly check the <a href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">registered
open bugs</a> open bugs</a>
</li> </li>
<li>make sure you can reproduce the bug with xmllint or one of the test <li>make sure you can reproduce the bug with xmllint or one of the test
programs found in source in the distribution</li> programs found in source in the distribution</li>
<li>Please send the command showing the error as well as the input (as an <li>Please send the command showing the error as well as the input (as an
attachement)</li> attachment)</li>
</ul> </ul>
<p>Then send the bug with associated informations to reproduce it to the <a href="mailto:xml@gnome.org">xml@gnome.org</a> list; if it's really libxml <p>Then send the bug with associated informations to reproduce it to the <a href="mailto:xml@gnome.org">xml@gnome.org</a> list; if it's really libxml
related I will approve it.. Please do not send me mail directly, it makes related I will approve it.. Please do not send me mail directly, it makes
@ -122,8 +122,8 @@ probably be processed faster.</p>
<p>If you're looking for help, a quick look at <a href="http://mail.gnome.org/archives/xml/">the list archive</a> may actually <p>If you're looking for help, a quick look at <a href="http://mail.gnome.org/archives/xml/">the list archive</a> may actually
provide the answer, I usually send source samples when answering libxml usage provide the answer, I usually send source samples when answering libxml usage
questions. The <a href="http://xmlsoft.org/html/book1.html">auto-generated questions. The <a href="http://xmlsoft.org/html/book1.html">auto-generated
documentantion</a> is not as polished as I would like (i need to learn more documentation</a> is not as polished as I would like (i need to learn more
about Docbook), but it's a good starting point.</p> about DocBook), but it's a good starting point.</p>
<p><a href="bugs.html">Daniel Veillard</a></p> <p><a href="bugs.html">Daniel Veillard</a></p>
</td></tr></table></td></tr></table></td></tr></table></td> </td></tr></table></td></tr></table></td></tr></table></td>
</tr></table></td></tr></table> </tr></table></td></tr></table>

View File

@ -384,7 +384,7 @@ support.</p>
<p>The XML Catalog specification is relatively recent so there isn't much <p>The XML Catalog specification is relatively recent so there isn't much
literature to point at:</p> literature to point at:</p>
<ul> <ul>
<li>You can find an good rant from Norm Walsh about <a href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the <li>You can find a good rant from Norm Walsh about <a href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the
need for catalogs</a>, it provides a lot of context informations even if need for catalogs</a>, it provides a lot of context informations even if
I don't agree with everything presented. Norm also wrote a more recent I don't agree with everything presented. Norm also wrote a more recent
article <a href="http://wwws.sun.com/software/xml/developers/resolver/article/">XML article <a href="http://wwws.sun.com/software/xml/developers/resolver/article/">XML
@ -405,7 +405,7 @@ literature to point at:</p>
~/xmlcatalog and ~/dbkxmlcatalog and doing: ~/xmlcatalog and ~/dbkxmlcatalog and doing:
<p><code>export XMLCATALOG=$HOME/xmlcatalog</code></p> <p><code>export XMLCATALOG=$HOME/xmlcatalog</code></p>
<p>should allow to process DocBook documentations without requiring <p>should allow to process DocBook documentations without requiring
network accesses for the DTd or stylesheets</p> network accesses for the DTD or stylesheets</p>
</li> </li>
<li>I have uploaded <a href="ftp://xmlsoft.org/test/dbk412catalog.tar.gz">a <li>I have uploaded <a href="ftp://xmlsoft.org/test/dbk412catalog.tar.gz">a
small tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems small tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems

View File

@ -102,7 +102,7 @@ A:link, A:visited, A:active { text-decoration: underline }
</li> </li>
<li> <li>
<a href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt <a href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt
Sergeant</a> developped <a href="http://axkit.org/download/">XML::LibXSLT</a>, a perl wrapper for Sergeant</a> developed <a href="http://axkit.org/download/">XML::LibXSLT</a>, a Perl wrapper for
libxml2/libxslt as part of the <a href="http://axkit.com/">AxKit XML libxml2/libxslt as part of the <a href="http://axkit.com/">AxKit XML
application server</a> application server</a>
</li> </li>

View File

@ -101,18 +101,18 @@ A:link, A:visited, A:active { text-decoration: underline }
<p>XML was designed from the start to allow the support of any character set <p>XML was designed from the start to allow the support of any character set
by using Unicode. Any conformant XML parser has to support the UTF-8 and by using Unicode. Any conformant XML parser has to support the UTF-8 and
UTF-16 default encodings which can both express the full unicode ranges. UTF8 UTF-16 default encodings which can both express the full unicode ranges. UTF8
is a variable length encoding whose greatest point are to resuse the same is a variable length encoding whose greatest points are to reuse the same
emcoding for ASCII and to save space for Western encodings, but it is a bit encoding for ASCII and to save space for Western encodings, but it is a bit
more complex to handle in practice. UTF-16 use 2 bytes per characters (and more complex to handle in practice. UTF-16 use 2 bytes per characters (and
sometimes combines two pairs), it makes implementation easier, but looks a sometimes combines two pairs), it makes implementation easier, but looks a
bit overkill for Western languages encoding. Moreover the XML specification bit overkill for Western languages encoding. Moreover the XML specification
allows document to be encoded in other encodings at the condition that they allows document to be encoded in other encodings at the condition that they
are clearly labelled as such. For example the following is a wellformed XML are clearly labeled as such. For example the following is a wellformed XML
document encoded in ISO-8859 1 and using accentuated letter that we French document encoded in ISO-8859 1 and using accentuated letter that we French
likes for both markup and content:</p> likes for both markup and content:</p>
<pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;ISO-8859-1&quot;?&gt; <pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;ISO-8859-1&quot;?&gt;
&lt;tr<EFBFBD>s&gt;l<EFBFBD>&lt;/tr<74>s&gt;</pre> &lt;tr<EFBFBD>s&gt;l<EFBFBD>&lt;/tr<74>s&gt;</pre>
<p>Having internationalization support in libxml means the foolowing:</p> <p>Having internationalization support in libxml means the following:</p>
<ul> <ul>
<li>the document is properly parsed</li> <li>the document is properly parsed</li>
<li>informations about it's encoding are saved</li> <li>informations about it's encoding are saved</li>
@ -125,7 +125,7 @@ likes for both markup and content:</p>
exception of a few routines to read with a specific encoding or save to a exception of a few routines to read with a specific encoding or save to a
specific encoding, is completely agnostic about the original encoding of the specific encoding, is completely agnostic about the original encoding of the
document.</p> document.</p>
<p>It should be noted too that the HTML parser embedded in libxml now obbey <p>It should be noted too that the HTML parser embedded in libxml now obey
the same rules too, the following document will be (as of 2.2.2) handled in the same rules too, the following document will be (as of 2.2.2) handled in
an internationalized fashion by libxml too:</p> an internationalized fashion by libxml too:</p>
<pre>&lt;!DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.0 Transitional//EN&quot; <pre>&lt;!DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.0 Transitional//EN&quot;
@ -151,7 +151,7 @@ rationale for those choices:</p>
cases this may make sense.</li> cases this may make sense.</li>
<li>the second decision was which encoding. From the XML spec only UTF8 and <li>the second decision was which encoding. From the XML spec only UTF8 and
UTF16 really makes sense as being the two only encodings for which there UTF16 really makes sense as being the two only encodings for which there
is amndatory support. UCS-4 (32 bits fixed size encoding) could be is mandatory support. UCS-4 (32 bits fixed size encoding) could be
considered an intelligent choice too since it's a direct Unicode mapping considered an intelligent choice too since it's a direct Unicode mapping
support. I selected UTF-8 on the basis of efficiency and compatibility support. I selected UTF-8 on the basis of efficiency and compatibility
with surrounding software: with surrounding software:
@ -210,7 +210,7 @@ err.xml:1: error: Bytes: 0xE8 0x73 0x3E 0x6C
&lt;tr<EFBFBD>s&gt;l<EFBFBD>&lt;/tr<74>s&gt; &lt;tr<EFBFBD>s&gt;l<EFBFBD>&lt;/tr<74>s&gt;
^</pre> ^</pre>
</li> </li>
<li>xmlSwitchEncoding() does an encoding name lookup, canonalize it, and <li>xmlSwitchEncoding() does an encoding name lookup, canonicalize it, and
then search the default registered encoding converters for that encoding. then search the default registered encoding converters for that encoding.
If it's not within the default set and iconv() support has been compiled If it's not within the default set and iconv() support has been compiled
it, it will ask iconv for such an encoder. If this fails then the parser it, it will ask iconv for such an encoder. If this fails then the parser
@ -220,7 +220,7 @@ err2.xml:1: error: Unsupported encoding UnsupportedEnc
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UnsupportedEnc&quot;?&gt; &lt;?xml version=&quot;1.0&quot; encoding=&quot;UnsupportedEnc&quot;?&gt;
^</pre> ^</pre>
</li> </li>
<li>From that point the encoder process progressingly the input (it is <li>From that point the encoder processes progressingly the input (it is
plugged as a front-end to the I/O module) for that entity. It captures plugged as a front-end to the I/O module) for that entity. It captures
and convert on-the-fly the document to be parsed to UTF-8. The parser and convert on-the-fly the document to be parsed to UTF-8. The parser
itself just does UTF-8 checking of this input and process it itself just does UTF-8 checking of this input and process it
@ -230,8 +230,8 @@ err2.xml:1: error: Unsupported encoding UnsupportedEnc
<li>The result (when using DOM) is an internal form completely in UTF-8 <li>The result (when using DOM) is an internal form completely in UTF-8
with just an encoding information on the document node.</li> with just an encoding information on the document node.</li>
</ol> </ol>
<p>Ok then what's happen when saving the document (assuming you <p>Ok then what happens when saving the document (assuming you
colllected/built an xmlDoc DOM like structure) ? It depends on the function collected/built an xmlDoc DOM like structure) ? It depends on the function
called, xmlSaveFile() will just try to save in the original encoding, while called, xmlSaveFile() will just try to save in the original encoding, while
xmlSaveFileTo() and xmlSaveFileEnc() can optionally save to a given xmlSaveFileTo() and xmlSaveFileEnc() can optionally save to a given
encoding:</p> encoding:</p>
@ -242,7 +242,7 @@ encoding:</p>
<p>otherwise everything is written in the internal form, i.e. UTF-8</p> <p>otherwise everything is written in the internal form, i.e. UTF-8</p>
</li> </li>
<li>so if an encoding was specified, either at the API level or on the <li>so if an encoding was specified, either at the API level or on the
document, libxml will again canonalize the encoding name, lookup for a document, libxml will again canonicalize the encoding name, lookup for a
converter in the registered set or through iconv. If not found the converter in the registered set or through iconv. If not found the
function will return an error code</li> function will return an error code</li>
<li>the converter is placed before the I/O buffer layer, as another kind of <li>the converter is placed before the I/O buffer layer, as another kind of
@ -250,14 +250,14 @@ encoding:</p>
that buffer, which will then progressively be converted and pushed onto that buffer, which will then progressively be converted and pushed onto
the I/O layer.</li> the I/O layer.</li>
<li>It is possible that the converter code fails on some input, for example <li>It is possible that the converter code fails on some input, for example
trying to push an UTF-8 encoded chinese character through the UTF-8 to trying to push an UTF-8 encoded Chinese character through the UTF-8 to
ISO-8859-1 converter won't work. Since the encoders are progressive they ISO-8859-1 converter won't work. Since the encoders are progressive they
will just report the error and the number of bytes converted, at that will just report the error and the number of bytes converted, at that
point libxml will decode the offending character, remove it from the point libxml will decode the offending character, remove it from the
buffer and replace it with the associated charRef encoding &amp;#123; and buffer and replace it with the associated charRef encoding &amp;#123; and
resume the convertion. This guarante that any document will be saved resume the conversion. This guarantees that any document will be saved
without losses (except for markup names where this is not legal, this is without losses (except for markup names where this is not legal, this is
a problem in the current version, in pactice avoid using non-ascci a problem in the current version, in practice avoid using non-ascii
characters for tags or attributes names @@). A special &quot;ascii&quot; encoding characters for tags or attributes names @@). A special &quot;ascii&quot; encoding
name is used to save documents to a pure ascii form can be used when name is used to save documents to a pure ascii form can be used when
portability is really crucial</li> portability is really crucial</li>
@ -288,7 +288,7 @@ detecting such a tag on input. Except for that the processing is the same
<li>HTML, a specific handler for the conversion of UTF-8 to ASCII with HTML <li>HTML, a specific handler for the conversion of UTF-8 to ASCII with HTML
predefined entities like &amp;copy; for the Copyright sign.</li> predefined entities like &amp;copy; for the Copyright sign.</li>
</ol> </ol>
<p>More over when compiled on an Unix platfor with iconv support the full set <p>More over when compiled on an Unix platform with iconv support the full set
of encodings supported by iconv can be instantly be used by libxml. On a of encodings supported by iconv can be instantly be used by libxml. On a
linux machine with glibc-2.1 the list of supported encodings and aliases fill linux machine with glibc-2.1 the list of supported encodings and aliases fill
3 full pages, and include UCS-4, the full set of ISO-Latin encodings, and the 3 full pages, and include UCS-4, the full set of ISO-Latin encodings, and the
@ -323,7 +323,7 @@ tried it. The key is to override the default conversion routines (by
registering null encoders/decoders for your charsets), and bypass the UTF-8 registering null encoders/decoders for your charsets), and bypass the UTF-8
checking of the parser by setting the parser context charset checking of the parser by setting the parser context charset
(ctxt-&gt;charset) to something different than XML_CHAR_ENCODING_UTF8, but (ctxt-&gt;charset) to something different than XML_CHAR_ENCODING_UTF8, but
there is no guarantee taht this will work. You may also have some troubles there is no guarantee that this will work. You may also have some troubles
saving back.</p> saving back.</p>
<p>Basically proper I18N support is important, this requires at least <p>Basically proper I18N support is important, this requires at least
libxml-2.0.0, but a lot of features and corrections are really available only libxml-2.0.0, but a lot of features and corrections are really available only

View File

@ -101,7 +101,7 @@ beginning). Example:</p>
7 &lt;/EXAMPLE&gt;</pre> 7 &lt;/EXAMPLE&gt;</pre>
<p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing <p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing
its name with '&amp;' and following it by ';' without any spaces added. There its name with '&amp;' and following it by ';' without any spaces added. There
are 5 predefined entities in libxml allowing you to escape charaters with are 5 predefined entities in libxml allowing you to escape characters with
predefined meaning in some parts of the xml document content: predefined meaning in some parts of the xml document content:
<strong>&amp;lt;</strong> for the character '&lt;', <strong>&amp;gt;</strong> <strong>&amp;lt;</strong> for the character '&lt;', <strong>&amp;gt;</strong>
for the character '&gt;', <strong>&amp;apos;</strong> for the character ''', for the character '&gt;', <strong>&amp;apos;</strong> for the character ''',
@ -113,7 +113,7 @@ your application. Or you may prefer to keep entity references as such in the
content to be able to save the document back without losing this usually content to be able to save the document back without losing this usually
precious information (if the user went through the pain of explicitly precious information (if the user went through the pain of explicitly
defining entities, he may have a a rather negative attitude if you blindly defining entities, he may have a a rather negative attitude if you blindly
susbtitute them as saving time). The <a href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a> substitute them as saving time). The <a href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a>
function allows you to check and change the behaviour, which is to not function allows you to check and change the behaviour, which is to not
substitute entities by default.</p> substitute entities by default.</p>
<p>Here is the DOM tree built by libxml for the previous document in the <p>Here is the DOM tree built by libxml for the previous document in the
@ -148,7 +148,7 @@ finding them in the input).</p>
<p> <p>
<span style="background-color: #FF0000">WARNING</span>: handling entities <span style="background-color: #FF0000">WARNING</span>: handling entities
on top of the libxml SAX interface is difficult!!! If you plan to use on top of the libxml SAX interface is difficult!!! If you plan to use
non-predefined entities in your documents, then the learning cuvre to handle non-predefined entities in your documents, then the learning curve to handle
then using the SAX API may be long. If you plan to use complex documents, I then using the SAX API may be long. If you plan to use complex documents, I
strongly suggest you consider using the DOM interface instead and let libxml strongly suggest you consider using the DOM interface instead and let libxml
deal with the complexity rather than trying to do it yourself.</p> deal with the complexity rather than trying to do it yourself.</p>

View File

@ -148,7 +148,7 @@ base</a>:</p>
&lt;/gjob:Jobs&gt; &lt;/gjob:Jobs&gt;
&lt;/gjob:Helping&gt;</pre> &lt;/gjob:Helping&gt;</pre>
<p>While loading the XML file into an internal DOM tree is a matter of <p>While loading the XML file into an internal DOM tree is a matter of
calling only a couple of functions, browsing the tree to gather the ata and calling only a couple of functions, browsing the tree to gather the data and
generate the internal structures is harder, and more error prone.</p> generate the internal structures is harder, and more error prone.</p>
<p>The suggested principle is to be tolerant with respect to the input <p>The suggested principle is to be tolerant with respect to the input
structure. For example, the ordering of the attributes is not significant, structure. For example, the ordering of the attributes is not significant,
@ -200,8 +200,8 @@ DEBUG(&quot;parsePerson\n&quot;);
<p>Here are a couple of things to notice:</p> <p>Here are a couple of things to notice:</p>
<ul> <ul>
<li>Usually a recursive parsing style is the more convenient one: XML data <li>Usually a recursive parsing style is the more convenient one: XML data
is by nature subject to repetitive constructs and usually exibits highly is by nature subject to repetitive constructs and usually exhibits highly
stuctured patterns.</li> structured patterns.</li>
<li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>, <li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>,
i.e. the pointer to the global XML document and the namespace reserved to i.e. the pointer to the global XML document and the namespace reserved to
the application. Document wide information are needed for example to the application. Document wide information are needed for example to
@ -267,7 +267,7 @@ DEBUG(&quot;parseJob\n&quot;);
return(ret); return(ret);
}</pre> }</pre>
<p>Once you are used to it, writing this kind of code is quite simple, but <p>Once you are used to it, writing this kind of code is quite simple, but
boring. Ultimately, it could be possble to write stubbers taking either C boring. Ultimately, it could be possible to write stubbers taking either C
data structure definitions, a set of XML examples or an XML DTD and produce data structure definitions, a set of XML examples or an XML DTD and produce
the code needed to import and export the content between C data and XML the code needed to import and export the content between C data and XML
storage. This is left as an exercise to the reader :-)</p> storage. This is left as an exercise to the reader :-)</p>

View File

@ -87,7 +87,7 @@ A:link, A:visited, A:active { text-decoration: underline }
</td></tr></table></td> </td></tr></table></td>
<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> <td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
<p> <p>
<p>Libxml is the XML C library developped for the Gnome project. XML itself <p>Libxml is the XML C library developed for the Gnome project. XML itself
is a metalanguage to design markup languages, i.e. text language where is a metalanguage to design markup languages, i.e. text language where
semantic and structure are added to the content using extra &quot;markup&quot; semantic and structure are added to the content using extra &quot;markup&quot;
information enclosed between angle bracket. HTML is the most well-known information enclosed between angle bracket. HTML is the most well-known

View File

@ -86,7 +86,7 @@ A:link, A:visited, A:active { text-decoration: underline }
</table> </table>
</td></tr></table></td> </td></tr></table></td>
<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> <td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
<p>This document describes libxml, the <a href="http://www.w3.org/XML/">XML</a> C library developped for the <a href="http://www.gnome.org/">Gnome</a> project. <a href="http://www.w3.org/XML/">XML is a standard</a> for building tag-based <p>This document describes libxml, the <a href="http://www.w3.org/XML/">XML</a> C library developed for the <a href="http://www.gnome.org/">Gnome</a> project. <a href="http://www.w3.org/XML/">XML is a standard</a> for building tag-based
structured documents/data.</p> structured documents/data.</p>
<p>Here are some key points about libxml:</p> <p>Here are some key points about libxml:</p>
<ul> <ul>
@ -98,14 +98,14 @@ structured documents/data.</p>
<li>It is written in plain C, making as few assumptions as possible, and <li>It is written in plain C, making as few assumptions as possible, and
sticking closely to ANSI C/POSIX for easy embedding. Works on sticking closely to ANSI C/POSIX for easy embedding. Works on
Linux/Unix/Windows, ported to a number of other platforms.</li> Linux/Unix/Windows, ported to a number of other platforms.</li>
<li>Basic support for HTTP and FTP client allowing aplications to fetch <li>Basic support for HTTP and FTP client allowing applications to fetch
remote resources</li> remote resources</li>
<li>The design is modular, most of the extensions can be compiled out.</li> <li>The design is modular, most of the extensions can be compiled out.</li>
<li>The internal document repesentation is as close as possible to the <a href="http://www.w3.org/DOM/">DOM</a> interfaces.</li> <li>The internal document representation is as close as possible to the <a href="http://www.w3.org/DOM/">DOM</a> interfaces.</li>
<li>Libxml also has a <a href="http://www.megginson.com/SAX/index.html">SAX <li>Libxml also has a <a href="http://www.megginson.com/SAX/index.html">SAX
like interface</a>; the interface is designed to be compatible with <a href="http://www.jclark.com/xml/expat.html">Expat</a>.</li> like interface</a>; the interface is designed to be compatible with <a href="http://www.jclark.com/xml/expat.html">Expat</a>.</li>
<li>This library is released under the <a href="http://www.opensource.org/licenses/mit-license.html">MIT <li>This library is released under the <a href="http://www.opensource.org/licenses/mit-license.html">MIT
Licence</a> see the Copyright file in the distribution for the precise License</a> see the Copyright file in the distribution for the precise
wording.</li> wording.</li>
</ul> </ul>
<p>Warning: unless you are forced to because your application links with a <p>Warning: unless you are forced to because your application links with a

View File

@ -87,7 +87,7 @@ A:link, A:visited, A:active { text-decoration: underline }
</td></tr></table></td> </td></tr></table></td>
<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> <td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
<p>The libxml library implements <a href="http://www.w3.org/TR/REC-xml-names/">XML namespaces</a> support by <p>The libxml library implements <a href="http://www.w3.org/TR/REC-xml-names/">XML namespaces</a> support by
recognizing namespace contructs in the input, and does namespace lookup recognizing namespace constructs in the input, and does namespace lookup
automatically when building the DOM tree. A namespace declaration is automatically when building the DOM tree. A namespace declaration is
associated with an in-memory structure and all elements or attributes within associated with an in-memory structure and all elements or attributes within
that namespace point to it. Hence testing the namespace is a simple and fast that namespace point to it. Hence testing the namespace is a simple and fast
@ -104,7 +104,7 @@ value in the long-term. Example:</p>
&lt;/mydoc&gt;</pre> &lt;/mydoc&gt;</pre>
<p>The namespace value has to be an absolute URL, but the URL doesn't have to <p>The namespace value has to be an absolute URL, but the URL doesn't have to
point to any existing resource on the Web. It will bind all the element and point to any existing resource on the Web. It will bind all the element and
atributes with that URL. I suggest to use an URL within a domain you control, attributes with that URL. I suggest to use an URL within a domain you control,
and that the URL should contain some kind of version information if possible. and that the URL should contain some kind of version information if possible.
For example, <code>&quot;http://www.gnome.org/gnumeric/1.0/&quot;</code> is a good For example, <code>&quot;http://www.gnome.org/gnumeric/1.0/&quot;</code> is a good
namespace scheme.</p> namespace scheme.</p>

View File

@ -109,7 +109,7 @@ it's actually not compiled in by default. The real fixes are:</p>
</ul> </ul>
<h3>2.4.20: Apr 15 2002</h3> <h3>2.4.20: Apr 15 2002</h3>
<ul> <ul>
<li>bug fixes: file descriptor leak, XPath, HTML ouput, DTD validation</li> <li>bug fixes: file descriptor leak, XPath, HTML output, DTD validation</li>
<li>XPath conformance testing by Richard Jinks</li> <li>XPath conformance testing by Richard Jinks</li>
<li>Portability fixes: Solaris, MPE/iX, Windows, OSF/1, python bindings, <li>Portability fixes: Solaris, MPE/iX, Windows, OSF/1, python bindings,
libxml.m4</li> libxml.m4</li>
@ -125,7 +125,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.4.18: Mar 18 2002</h3> <h3>2.4.18: Mar 18 2002</h3>
<ul> <ul>
<li>bug fixes: tree, SAX, canonicalization, validation, portability, <li>bug fixes: tree, SAX, canonicalization, validation, portability,
xpath</li> XPath</li>
<li>removed the --with-buffer option it was becoming unmaintainable</li> <li>removed the --with-buffer option it was becoming unmaintainable</li>
<li>serious cleanup of the Python makefiles</li> <li>serious cleanup of the Python makefiles</li>
<li>speedup patch to XPath very effective for DocBook stylesheets</li> <li>speedup patch to XPath very effective for DocBook stylesheets</li>
@ -137,7 +137,7 @@ it's actually not compiled in by default. The real fixes are:</p>
XPath&quot;</li> XPath&quot;</li>
<li>fixed/improved the Python wrappers, added more examples and more <li>fixed/improved the Python wrappers, added more examples and more
regression tests, XPath extension functions can now return node-sets</li> regression tests, XPath extension functions can now return node-sets</li>
<li>added the XML Canonalization support from Aleksey Sanin</li> <li>added the XML Canonicalization support from Aleksey Sanin</li>
</ul> </ul>
<h3>2.4.16: Feb 20 2002</h3> <h3>2.4.16: Feb 20 2002</h3>
<ul> <ul>
@ -153,9 +153,9 @@ it's actually not compiled in by default. The real fixes are:</p>
</ul> </ul>
<h3>2.4.14: Feb 8 2002</h3> <h3>2.4.14: Feb 8 2002</h3>
<ul> <ul>
<li>Change of Licence to the <a href="http://www.opensource.org/licenses/mit-license.html">MIT <li>Change of License to the <a href="http://www.opensource.org/licenses/mit-license.html">MIT
Licence</a> basisally for integration in XFree86 codebase, and removing License</a> basically for integration in XFree86 codebase, and removing
confusion around the previous dual-licencing</li> confusion around the previous dual-licensing</li>
<li>added Python bindings, beta software but should already be quite <li>added Python bindings, beta software but should already be quite
complete</li> complete</li>
<li>a large number of fixes and cleanups, especially for all tree <li>a large number of fixes and cleanups, especially for all tree
@ -230,7 +230,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>portability and configure fixes</li> <li>portability and configure fixes</li>
<li>an infinite loop on the HTML parser was removed (William)</li> <li>an infinite loop on the HTML parser was removed (William)</li>
<li>Windows makefile patches from Igor</li> <li>Windows makefile patches from Igor</li>
<li>fixed half a dozen bugs reported fof libxml or libxslt</li> <li>fixed half a dozen bugs reported for libxml or libxslt</li>
<li>updated xmlcatalog to be able to modify SGML super catalogs</li> <li>updated xmlcatalog to be able to modify SGML super catalogs</li>
</ul> </ul>
<h3>2.4.5: Sep 14 2001</h3> <h3>2.4.5: Sep 14 2001</h3>
@ -259,7 +259,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<ul> <ul>
<li>adds xmlLineNumbersDefault() to control line number generation</li> <li>adds xmlLineNumbersDefault() to control line number generation</li>
<li>lot of bug fixes</li> <li>lot of bug fixes</li>
<li>the Microsoft MSC projects files shuld now be up to date</li> <li>the Microsoft MSC projects files should now be up to date</li>
<li>inheritance of namespaces from DTD defaulted attributes</li> <li>inheritance of namespaces from DTD defaulted attributes</li>
<li>fixes a serious potential security bug</li> <li>fixes a serious potential security bug</li>
<li>added a --format option to xmllint</li> <li>added a --format option to xmllint</li>
@ -275,20 +275,20 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.4.0: July 10 2001</h3> <h3>2.4.0: July 10 2001</h3>
<ul> <ul>
<li>Fixed a few bugs in XPath, validation, and tree handling.</li> <li>Fixed a few bugs in XPath, validation, and tree handling.</li>
<li>Fixed XML Base implementation, added a coupel of examples to the <li>Fixed XML Base implementation, added a couple of examples to the
regression tests</li> regression tests</li>
<li>A bit of cleanup</li> <li>A bit of cleanup</li>
</ul> </ul>
<h3>2.3.14: July 5 2001</h3> <h3>2.3.14: July 5 2001</h3>
<ul> <ul>
<li>fixed some entities problems and reduce mem requirement when <li>fixed some entities problems and reduce memory requirement when
substituing them</li> substituting them</li>
<li>lots of improvements in the XPath queries interpreter can be <li>lots of improvements in the XPath queries interpreter can be
substancially faster</li> substantially faster</li>
<li>Makefiles and configure cleanups</li> <li>Makefiles and configure cleanups</li>
<li>Fixes to XPath variable eval, and compare on empty node set</li> <li>Fixes to XPath variable eval, and compare on empty node set</li>
<li>HTML tag closing bug fixed</li> <li>HTML tag closing bug fixed</li>
<li>Fixed an URI reference computating problem when validating</li> <li>Fixed an URI reference computation problem when validating</li>
</ul> </ul>
<h3>2.3.13: June 28 2001</h3> <h3>2.3.13: June 28 2001</h3>
<ul> <ul>
@ -342,9 +342,9 @@ it's actually not compiled in by default. The real fixes are:</p>
<p>Lots of bugfixes, and added a basic SGML catalog support:</p> <p>Lots of bugfixes, and added a basic SGML catalog support:</p>
<ul> <ul>
<li>HTML push bugfix #54891 and another patch from Jonas Borgstr<74>m</li> <li>HTML push bugfix #54891 and another patch from Jonas Borgstr<74>m</li>
<li>some serious speed optimisation again</li> <li>some serious speed optimization again</li>
<li>some documentation cleanups</li> <li>some documentation cleanups</li>
<li>trying to get better linking on solaris (-R)</li> <li>trying to get better linking on Solaris (-R)</li>
<li>XPath API cleanup from Thomas Broyer</li> <li>XPath API cleanup from Thomas Broyer</li>
<li>Validation bug fixed #54631, added a patch from Gary Pennington, fixed <li>Validation bug fixed #54631, added a patch from Gary Pennington, fixed
xmlValidGetValidElements()</li> xmlValidGetValidElements()</li>
@ -374,12 +374,12 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.3.7: April 22 2001</h3> <h3>2.3.7: April 22 2001</h3>
<ul> <ul>
<li>lots of small bug fixes, corrected XPointer</li> <li>lots of small bug fixes, corrected XPointer</li>
<li>Non determinist content model validation support</li> <li>Non deterministic content model validation support</li>
<li>added xmlDocCopyNode for gdome2</li> <li>added xmlDocCopyNode for gdome2</li>
<li>revamped the way the HTML parser handles end of tags</li> <li>revamped the way the HTML parser handles end of tags</li>
<li>XPath: corrctions of namespacessupport and number formatting</li> <li>XPath: corrections of namespaces support and number formatting</li>
<li>Windows: Igor Zlatkovic patches for MSC compilation</li> <li>Windows: Igor Zlatkovic patches for MSC compilation</li>
<li>HTML ouput fixes from P C Chow and William M. Brack</li> <li>HTML output fixes from P C Chow and William M. Brack</li>
<li>Improved validation speed sensible for DocBook</li> <li>Improved validation speed sensible for DocBook</li>
<li>fixed a big bug with ID declared in external parsed entities</li> <li>fixed a big bug with ID declared in external parsed entities</li>
<li>portability fixes, update of Trio from Bjorn Reese</li> <li>portability fixes, update of Trio from Bjorn Reese</li>
@ -417,7 +417,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>Bjorn fixed XPath node collection and Number formatting</li> <li>Bjorn fixed XPath node collection and Number formatting</li>
<li>Fixed a loop reported in the HTML parsing</li> <li>Fixed a loop reported in the HTML parsing</li>
<li>blank space are reported even if the Dtd content model proves that they <li>blank space are reported even if the Dtd content model proves that they
are formatting spaces, this is for XmL conformance</li> are formatting spaces, this is for XML conformance</li>
</ul> </ul>
<h3>2.3.3: Mar 1 2001</h3> <h3>2.3.3: Mar 1 2001</h3>
<ul> <ul>
@ -455,7 +455,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>added HTML to the RPM packages</li> <li>added HTML to the RPM packages</li>
<li>tree copying bugfixes</li> <li>tree copying bugfixes</li>
<li>updates to Windows makefiles</li> <li>updates to Windows makefiles</li>
<li>optimisation patch from Bjorn Reese</li> <li>optimization patch from Bjorn Reese</li>
</ul> </ul>
<h3>2.2.11: Jan 4 2001</h3> <h3>2.2.11: Jan 4 2001</h3>
<ul> <ul>
@ -528,7 +528,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>cleanup of entity handling code</li> <li>cleanup of entity handling code</li>
<li>overall review of all loops in the parsers, all sprintf usage has been <li>overall review of all loops in the parsers, all sprintf usage has been
checked too</li> checked too</li>
<li>Far better handling of larges Dtd. Validating against Docbook XML Dtd <li>Far better handling of larges Dtd. Validating against DocBook XML Dtd
works smoothly now.</li> works smoothly now.</li>
</ul> </ul>
<h3>1.8.10: Sep 6 2000</h3> <h3>1.8.10: Sep 6 2000</h3>
@ -573,7 +573,7 @@ it's actually not compiled in by default. The real fixes are:</p>
</ul> </ul>
<h3>2.1.0 and 1.8.8: June 29 2000</h3> <h3>2.1.0 and 1.8.8: June 29 2000</h3>
<ul> <ul>
<li>1.8.8 is mostly a comodity package for upgrading to libxml2 accoding to <li>1.8.8 is mostly a commodity package for upgrading to libxml2 according to
<a href="upgrade.html">new instructions</a>. It fixes a nasty problem <a href="upgrade.html">new instructions</a>. It fixes a nasty problem
about &amp;#38; charref parsing</li> about &amp;#38; charref parsing</li>
<li>2.1.0 also ease the upgrade from libxml v1 to the recent version. it <li>2.1.0 also ease the upgrade from libxml v1 to the recent version. it
@ -582,7 +582,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>added xmlStopParser() to stop parsing</li> <li>added xmlStopParser() to stop parsing</li>
<li>improved a lot parsing speed when there is large CDATA blocs</li> <li>improved a lot parsing speed when there is large CDATA blocs</li>
<li>includes XPath patches provided by Picdar Technology</li> <li>includes XPath patches provided by Picdar Technology</li>
<li>tried to fix as much as possible DtD validation and namespace <li>tried to fix as much as possible DTD validation and namespace
related problems</li> related problems</li>
<li>output to a given encoding has been added/tested</li> <li>output to a given encoding has been added/tested</li>
<li>lot of various fixes</li> <li>lot of various fixes</li>
@ -592,8 +592,8 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.0.0: Apr 12 2000</h3> <h3>2.0.0: Apr 12 2000</h3>
<ul> <ul>
<li>First public release of libxml2. If you are using libxml, it's a good <li>First public release of libxml2. If you are using libxml, it's a good
idea to check the 1.x to 2.x upgrade instructions. NOTE: while initally idea to check the 1.x to 2.x upgrade instructions. NOTE: while initially
scheduled for Apr 3 the relase occured only on Apr 12 due to massive scheduled for Apr 3 the release occurred only on Apr 12 due to massive
workload.</li> workload.</li>
<li>The include are now located under $prefix/include/libxml (instead of <li>The include are now located under $prefix/include/libxml (instead of
$prefix/include/gnome-xml), they also are referenced by $prefix/include/gnome-xml), they also are referenced by
@ -632,16 +632,16 @@ it's actually not compiled in by default. The real fixes are:</p>
<ul> <ul>
<li>fix I18N support. ISO-Latin-x/UTF-8/UTF-16 (nearly) seems correctly <li>fix I18N support. ISO-Latin-x/UTF-8/UTF-16 (nearly) seems correctly
handled now</li> handled now</li>
<li>Better handling of entities, especially well formedness checking <li>Better handling of entities, especially well-formedness checking
and proper PEref extensions in external subsets</li> and proper PEref extensions in external subsets</li>
<li>DTD conditional sections</li> <li>DTD conditional sections</li>
<li>Validation now correcly handle entities content</li> <li>Validation now correctly handle entities content</li>
<li><a href="http://rpmfind.net/tools/gdome/messages/0039.html">change <li><a href="http://rpmfind.net/tools/gdome/messages/0039.html">change
structures to accomodate DOM</a></li> structures to accommodate DOM</a></li>
</ul> </ul>
</li> </li>
<li>Serious progress were made toward compliance, <a href="conf/result.html">here are the result of the test</a> against the <li>Serious progress were made toward compliance, <a href="conf/result.html">here are the result of the test</a> against the
OASIS testsuite (except the japanese tests since I don't support that OASIS testsuite (except the Japanese tests since I don't support that
encoding yet). This URL is rebuilt every couple of hours using the CVS encoding yet). This URL is rebuilt every couple of hours using the CVS
head version.</li> head version.</li>
</ul> </ul>
@ -684,7 +684,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<ul> <ul>
<li>a Push interface for the XML and HTML parsers</li> <li>a Push interface for the XML and HTML parsers</li>
<li>a shell-like interface to the document tree (try tester --shell :-)</li> <li>a shell-like interface to the document tree (try tester --shell :-)</li>
<li>lots of bug fixes and improvement added over XMas hollidays</li> <li>lots of bug fixes and improvement added over XMas holidays</li>
<li>fixed the DTD parsing code to work with the xhtml DTD</li> <li>fixed the DTD parsing code to work with the xhtml DTD</li>
<li>added xmlRemoveProp(), xmlRemoveID() and xmlRemoveRef()</li> <li>added xmlRemoveProp(), xmlRemoveID() and xmlRemoveRef()</li>
<li>Fixed bugs in xmlNewNs()</li> <li>Fixed bugs in xmlNewNs()</li>
@ -722,8 +722,8 @@ it's actually not compiled in by default. The real fixes are:</p>
dataset from <a href="mailto:cnygard@bellatlantic.net">Carl Nygard</a>, dataset from <a href="mailto:cnygard@bellatlantic.net">Carl Nygard</a>,
configure with --with-buffers to enable them.</li> configure with --with-buffers to enable them.</li>
<li>attribute normalization, oops should have been added long ago !</li> <li>attribute normalization, oops should have been added long ago !</li>
<li>attributes defaulted from Dtds should be available, xmlSetProp() now <li>attributes defaulted from DTDs should be available, xmlSetProp() now
does entities escapting by default.</li> does entities escaping by default.</li>
</ul> </ul>
<h3>1.7.4: Oct 25 1999</h3> <h3>1.7.4: Oct 25 1999</h3>
<ul> <ul>
@ -735,7 +735,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>1.7.3: Sep 29 1999</h3> <h3>1.7.3: Sep 29 1999</h3>
<ul> <ul>
<li>portability problems fixed</li> <li>portability problems fixed</li>
<li>snprintf was used unconditionnally, leading to link problems on system <li>snprintf was used unconditionally, leading to link problems on system
were it's not available, fixed</li> were it's not available, fixed</li>
</ul> </ul>
<h3>1.7.1: Sep 24 1999</h3> <h3>1.7.1: Sep 24 1999</h3>
@ -748,7 +748,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>Changed another error : the use of a structure field called errno, and <li>Changed another error : the use of a structure field called errno, and
leading to troubles on platforms where it's a macro</li> leading to troubles on platforms where it's a macro</li>
</ul> </ul>
<h3>1.7.0: sep 23 1999</h3> <h3>1.7.0: Sep 23 1999</h3>
<ul> <ul>
<li>Added the ability to fetch remote DTD or parsed entities, see the <a href="html/libxml-nanohttp.html">nanohttp</a> module.</li> <li>Added the ability to fetch remote DTD or parsed entities, see the <a href="html/libxml-nanohttp.html">nanohttp</a> module.</li>
<li>Added an errno to report errors by another mean than a simple printf <li>Added an errno to report errors by another mean than a simple printf

View File

@ -106,7 +106,7 @@ or libxslt wrappers or bindings:</p>
</li> </li>
<li> <li>
<a href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt <a href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt
Sergeant</a> developped <a href="http://axkit.org/download/">XML::LibXSLT</a>, a perl wrapper for Sergeant</a> developed <a href="http://axkit.org/download/">XML::LibXSLT</a>, a Perl wrapper for
libxml2/libxslt as part of the <a href="http://axkit.com/">AxKit XML libxml2/libxslt as part of the <a href="http://axkit.com/">AxKit XML
application server</a> application server</a>
</li> </li>
@ -126,7 +126,7 @@ or libxslt wrappers or bindings:</p>
</li> </li>
<li>There is support for libxml2 in the DOM module of PHP.</li> <li>There is support for libxml2 in the DOM module of PHP.</li>
</ul> </ul>
<p>The distribution includes a set of Python bindings, which are garanteed to <p>The distribution includes a set of Python bindings, which are guaranteed to
be maintained as part of the library in the future, though the Python be maintained as part of the library in the future, though the Python
interface have not yet reached the maturity of the C API.</p> interface have not yet reached the maturity of the C API.</p>
<p>To install the Python bindings there are 2 options:</p> <p>To install the Python bindings there are 2 options:</p>
@ -163,14 +163,13 @@ doc.freeDoc()</pre>
<p>The Python module is called libxml2, parseFile is the equivalent of <p>The Python module is called libxml2, parseFile is the equivalent of
xmlParseFile (most of the bindings are automatically generated, and the xml xmlParseFile (most of the bindings are automatically generated, and the xml
prefix is removed and the casing convention are kept). All node seen at the prefix is removed and the casing convention are kept). All node seen at the
binding level share the same subset of accesors:</p> binding level share the same subset of accessors:</p>
<ul> <ul>
<li> <li>
<code>name</code> : returns the node name</li> <code>name</code> : returns the node name</li>
<li> <li>
<code>type</code> : returns a string indicating the node <code>type</code> : returns a string indicating the node
typ<code>e</code> type</li>
</li>
<li> <li>
<code>content</code> : returns the content of the node, it is based on <code>content</code> : returns the content of the node, it is based on
xmlNodeGetContent() and hence is recursive.</li> xmlNodeGetContent() and hence is recursive.</li>
@ -180,7 +179,7 @@ binding level share the same subset of accesors:</p>
<code>properties</code>: pointing to the associated element in the tree, <code>properties</code>: pointing to the associated element in the tree,
those may return None in case no such link exists.</li> those may return None in case no such link exists.</li>
</ul> </ul>
<p>Also note the need to explicitely deallocate documents with freeDoc() . <p>Also note the need to explicitly deallocate documents with freeDoc() .
Reference counting for libxml2 trees would need quite a lot of work to Reference counting for libxml2 trees would need quite a lot of work to
function properly, and rather than risk memory leaks if not implemented function properly, and rather than risk memory leaks if not implemented
correctly it sounds safer to have an explicit function to free a tree. The correctly it sounds safer to have an explicit function to free a tree. The
@ -191,7 +190,7 @@ collected.</p>
messages:</p> messages:</p>
<pre>import libxml2 <pre>import libxml2
#desactivate error messages from the validation #deactivate error messages from the validation
def noerr(ctx, str): def noerr(ctx, str):
pass pass
@ -204,13 +203,13 @@ doc = ctxt.doc()
valid = ctxt.isValid() valid = ctxt.isValid()
doc.freeDoc() doc.freeDoc()
if valid != 0: if valid != 0:
print &quot;validity chec failed&quot;</pre> print &quot;validity check failed&quot;</pre>
<p>The first thing to notice is the call to registerErrorHandler(), it <p>The first thing to notice is the call to registerErrorHandler(), it
defines a new error handler global to the library. It is used to avoid seeing defines a new error handler global to the library. It is used to avoid seeing
the error messages when trying to validate the invalid document.</p> the error messages when trying to validate the invalid document.</p>
<p>The main interest of that test is the creation of a parser context with <p>The main interest of that test is the creation of a parser context with
createFileParserCtxt() and how the behaviour can be changed before calling createFileParserCtxt() and how the behaviour can be changed before calling
parseDocument() . Similary the informations resulting from the parsing phase parseDocument() . Similarly the informations resulting from the parsing phase
are also available using context methods.</p> are also available using context methods.</p>
<p>Contexts like nodes are defined as class and the libxml2 wrappers maps the <p>Contexts like nodes are defined as class and the libxml2 wrappers maps the
C function interfaces in terms of objects method as much as possible. The C function interfaces in terms of objects method as much as possible. The
@ -225,12 +224,12 @@ ctxt.parseChunk(&quot;/&gt;&quot;, 2, 1)
doc = ctxt.doc() doc = ctxt.doc()
doc.freeDoc()</pre> doc.freeDoc()</pre>
<p>The context is created with a speciall call based on the <p>The context is created with a special call based on the
xmlCreatePushParser() from the C library. The first argument is an optional xmlCreatePushParser() from the C library. The first argument is an optional
SAX callback object, then the initial set of data, the lenght and the name of SAX callback object, then the initial set of data, the length and the name of
the resource in case URI-References need to be computed by the parser.</p> the resource in case URI-References need to be computed by the parser.</p>
<p>Then the data are pushed using the parseChunk() method, the last call <p>Then the data are pushed using the parseChunk() method, the last call
setting the thrird argument terminate to 1.</p> setting the third argument terminate to 1.</p>
<h3>pushSAX.py:</h3> <h3>pushSAX.py:</h3>
<p>this test show the use of the event based parsing interfaces. In this case <p>this test show the use of the event based parsing interfaces. In this case
the parser does not build a document, but provides callback information as the parser does not build a document, but provides callback information as
@ -283,19 +282,19 @@ reference = &quot;startDocument:startElement foo {'url': 'tst'}:&quot; + \
&quot;characters: bar:endElement foo:endDocument:&quot; &quot;characters: bar:endElement foo:endDocument:&quot;
if log != reference: if log != reference:
print &quot;Error got: %s&quot; % log print &quot;Error got: %s&quot; % log
print &quot;Exprected: %s&quot; % reference</pre> print &quot;Expected: %s&quot; % reference</pre>
<p>The key object in that test is the handler, it provides a number of entry <p>The key object in that test is the handler, it provides a number of entry
points which can be called by the parser as it makes progresses to indicate points which can be called by the parser as it makes progresses to indicate
the information set obtained. The full set of callback is larger than what the information set obtained. The full set of callback is larger than what
the callback class in that specific example implements (see the SAX the callback class in that specific example implements (see the SAX
definition for a complete list). The wrapper will only call those supplied by definition for a complete list). The wrapper will only call those supplied by
the object when activated. The startElement receives the names of the element the object when activated. The startElement receives the names of the element
and a dictionnary containing the attributes carried by this element.</p> and a dictionary containing the attributes carried by this element.</p>
<p>Also note that the reference string generated from the callback shows a <p>Also note that the reference string generated from the callback shows a
single character call even though the string &quot;bar&quot; is passed to the parser single character call even though the string &quot;bar&quot; is passed to the parser
from 2 different call to parseChunk()</p> from 2 different call to parseChunk()</p>
<h3>xpath.py:</h3> <h3>xpath.py:</h3>
<p>This is a basic test of XPath warppers support</p> <p>This is a basic test of XPath wrappers support</p>
<pre>import libxml2 <pre>import libxml2
doc = libxml2.parseFile(&quot;tst.xml&quot;) doc = libxml2.parseFile(&quot;tst.xml&quot;)
@ -313,7 +312,7 @@ ctxt.xpathFreeContext()</pre>
expression on it. The xpathEval() method execute an XPath query and returns expression on it. The xpathEval() method execute an XPath query and returns
the result mapped in a Python way. String and numbers are natively converted, the result mapped in a Python way. String and numbers are natively converted,
and node sets are returned as a tuple of libxml2 Python nodes wrappers. Like and node sets are returned as a tuple of libxml2 Python nodes wrappers. Like
the document, the XPath context need to be freed explicitely, also not that the document, the XPath context need to be freed explicitly, also not that
the result of the XPath query may point back to the document tree and hence the result of the XPath query may point back to the document tree and hence
the document must be freed after the result of the query is used.</p> the document must be freed after the result of the query is used.</p>
<h3>xpathext.py:</h3> <h3>xpathext.py:</h3>
@ -333,7 +332,7 @@ if res != 2:
doc.freeDoc() doc.freeDoc()
ctxt.xpathFreeContext()</pre> ctxt.xpathFreeContext()</pre>
<p>Note how the extension function is registered with the context (but that <p>Note how the extension function is registered with the context (but that
part is not yet finalized, ths may change slightly in the future).</p> part is not yet finalized, this may change slightly in the future).</p>
<h3>tstxpath.py:</h3> <h3>tstxpath.py:</h3>
<p>This test is similar to the previous one but shows how the extension <p>This test is similar to the previous one but shows how the extension
function can access the XPath evaluation context:</p> function can access the XPath evaluation context:</p>
@ -363,7 +362,7 @@ else:
print &quot;Memory leak %d bytes&quot; % (libxml2.debugMemory(1)) print &quot;Memory leak %d bytes&quot; % (libxml2.debugMemory(1))
libxml2.dumpMemory()</pre> libxml2.dumpMemory()</pre>
<p>Those activate the memory debugging interface of libxml2 where all <p>Those activate the memory debugging interface of libxml2 where all
alloacted block in the library are tracked. The prologue then cleans up the allocated block in the library are tracked. The prologue then cleans up the
library state and checks that all allocated memory has been freed. If not it library state and checks that all allocated memory has been freed. If not it
calls dumpMemory() which saves that list in a <code>.memdump</code> file.</p> calls dumpMemory() which saves that list in a <code>.memdump</code> file.</p>
<p><a href="bugs.html">Daniel Veillard</a></p> <p><a href="bugs.html">Daniel Veillard</a></p>

View File

@ -86,7 +86,7 @@ A:link, A:visited, A:active { text-decoration: underline }
</table> </table>
</td></tr></table></td> </td></tr></table></td>
<td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd"> <td valign="top" bgcolor="#8b7765"><table border="0" cellspacing="0" cellpadding="1" width="100%"><tr><td><table border="0" cellspacing="0" cellpadding="1" width="100%" bgcolor="#000000"><tr><td><table border="0" cellpadding="3" cellspacing="1" width="100%"><tr><td bgcolor="#fffacd">
<p>Starting with 2.4.7, libxml makes provisions to ensure that concurent <p>Starting with 2.4.7, libxml makes provisions to ensure that concurrent
threads can safely work in parallel parsing different documents. There is threads can safely work in parallel parsing different documents. There is
however a couple of things to do to ensure it:</p> however a couple of things to do to ensure it:</p>
<ul> <ul>

View File

@ -115,14 +115,14 @@ mail</a>:</p>
select the right parameters libxml2</li> select the right parameters libxml2</li>
<li>Node <strong>childs</strong> field has been renamed <li>Node <strong>childs</strong> field has been renamed
<strong>children</strong> so s/childs/children/g should be applied <strong>children</strong> so s/childs/children/g should be applied
(probablility of having &quot;childs&quot; anywere else is close to 0+</li> (probability of having &quot;childs&quot; anywhere else is close to 0+</li>
<li>The document don't have anymore a <strong>root</strong> element it has <li>The document don't have anymore a <strong>root</strong> element it has
been replaced by <strong>children</strong> and usually you will get a been replaced by <strong>children</strong> and usually you will get a
list of element here. For example a Dtd element for the internal subset list of element here. For example a Dtd element for the internal subset
and it's declaration may be found in that list, as well as processing and it's declaration may be found in that list, as well as processing
instructions or comments found before or after the document root element. instructions or comments found before or after the document root element.
Use <strong>xmlDocGetRootElement(doc)</strong> to get the root element of Use <strong>xmlDocGetRootElement(doc)</strong> to get the root element of
a document. Alternatively if you are sure to not reference Dtds nor have a document. Alternatively if you are sure to not reference DTDs nor have
PIs or comments before or after the root element PIs or comments before or after the root element
s/-&gt;root/-&gt;children/g will probably do it.</li> s/-&gt;root/-&gt;children/g will probably do it.</li>
<li>The white space issue, this one is more complex, unless special case of <li>The white space issue, this one is more complex, unless special case of
@ -136,9 +136,9 @@ mail</a>:</p>
relying on a special (and possibly broken) set of heuristics of relying on a special (and possibly broken) set of heuristics of
libxml to detect ignorable blanks. Don't complain if it breaks or libxml to detect ignorable blanks. Don't complain if it breaks or
make your application not 100% clean w.r.t. to it's input.</li> make your application not 100% clean w.r.t. to it's input.</li>
<li>the Right Way: change you code to accept possibly unsignificant <li>the Right Way: change you code to accept possibly insignificant
blanks characters, or have your tree populated with weird blank text blanks characters, or have your tree populated with weird blank text
nodes. You can spot them using the comodity function nodes. You can spot them using the commodity function
<strong>xmlIsBlankNode(node)</strong> returning 1 for such blank <strong>xmlIsBlankNode(node)</strong> returning 1 for such blank
nodes.</li> nodes.</li>
</ol> </ol>
@ -154,12 +154,12 @@ mail</a>:</p>
<p>output to generate you compile commands this will probably work out of <p>output to generate you compile commands this will probably work out of
the box</p> the box</p>
</li> </li>
<li>xmlDetectCharEncoding takes an extra argument indicating the lenght in <li>xmlDetectCharEncoding takes an extra argument indicating the length in
byte of the head of the document available for character detection.</li> byte of the head of the document available for character detection.</li>
</ol> </ol>
<h3>Ensuring both libxml-1.x and libxml-2.x compatibility</h3> <h3>Ensuring both libxml-1.x and libxml-2.x compatibility</h3>
<p>Two new version of libxml (1.8.11) and libxml2 (2.3.4) have been released <p>Two new version of libxml (1.8.11) and libxml2 (2.3.4) have been released
to allow smoth upgrade of existing libxml v1code while retaining to allow smooth upgrade of existing libxml v1code while retaining
compatibility. They offers the following:</p> compatibility. They offers the following:</p>
<ol> <ol>
<li>similar include naming, one should use <li>similar include naming, one should use
@ -175,17 +175,17 @@ compatibility. They offers the following:</p>
following:</p> following:</p>
<ol> <ol>
<li>install the libxml-1.8.8 (and libxml-devel-1.8.8) packages</li> <li>install the libxml-1.8.8 (and libxml-devel-1.8.8) packages</li>
<li>find all occurences where the xmlDoc <strong>root</strong> field is <li>find all occurrences where the xmlDoc <strong>root</strong> field is
used and change it to <strong>xmlRootNode</strong> used and change it to <strong>xmlRootNode</strong>
</li> </li>
<li>similary find all occurences where the xmlNode <strong>childs</strong> <li>similarly find all occurrences where the xmlNode <strong>childs</strong>
field is used and change it to <strong>xmlChildrenNode</strong> field is used and change it to <strong>xmlChildrenNode</strong>
</li> </li>
<li>add a <strong>LIBXML_TEST_VERSION</strong> macro somewhere in your <li>add a <strong>LIBXML_TEST_VERSION</strong> macro somewhere in your
<strong>main()</strong> or in the library init entry point</li> <strong>main()</strong> or in the library init entry point</li>
<li>Recompile, check compatibility, it should still work</li> <li>Recompile, check compatibility, it should still work</li>
<li>Change your configure script to look first for xml2-config and fall back <li>Change your configure script to look first for xml2-config and fall back
using xml-config . Use the --cflags and --libs ouptut of the command as using xml-config . Use the --cflags and --libs output of the command as
the Include and Linking parameters needed to use libxml.</li> the Include and Linking parameters needed to use libxml.</li>
<li>install libxml2-2.3.x and libxml2-devel-2.3.x (libxml-1.8.y and <li>install libxml2-2.3.x and libxml2-devel-2.3.x (libxml-1.8.y and
libxml-devel-1.8.y can be kept simultaneously)</li> libxml-devel-1.8.y can be kept simultaneously)</li>

View File

@ -17,7 +17,7 @@ site</a></h1>
<p></p> <p></p>
<p>Libxml is the XML C library developped for the Gnome project. XML itself <p>Libxml is the XML C library developed for the Gnome project. XML itself
is a metalanguage to design markup languages, i.e. text language where is a metalanguage to design markup languages, i.e. text language where
semantic and structure are added to the content using extra "markup" semantic and structure are added to the content using extra "markup"
information enclosed between angle bracket. HTML is the most well-known information enclosed between angle bracket. HTML is the most well-known
@ -103,7 +103,7 @@ CygWin, MacOs, MacOsX, RISC Os, OS/2, VMS, QNX, MVS, ...)</p>
<h2><a name="Introducti">Introduction</a></h2> <h2><a name="Introducti">Introduction</a></h2>
<p>This document describes libxml, the <a <p>This document describes libxml, the <a
href="http://www.w3.org/XML/">XML</a> C library developped for the <a href="http://www.w3.org/XML/">XML</a> C library developed for the <a
href="http://www.gnome.org/">Gnome</a> project. <a href="http://www.gnome.org/">Gnome</a> project. <a
href="http://www.w3.org/XML/">XML is a standard</a> for building tag-based href="http://www.w3.org/XML/">XML is a standard</a> for building tag-based
structured documents/data.</p> structured documents/data.</p>
@ -121,17 +121,17 @@ structured documents/data.</p>
<li>It is written in plain C, making as few assumptions as possible, and <li>It is written in plain C, making as few assumptions as possible, and
sticking closely to ANSI C/POSIX for easy embedding. Works on sticking closely to ANSI C/POSIX for easy embedding. Works on
Linux/Unix/Windows, ported to a number of other platforms.</li> Linux/Unix/Windows, ported to a number of other platforms.</li>
<li>Basic support for HTTP and FTP client allowing aplications to fetch <li>Basic support for HTTP and FTP client allowing applications to fetch
remote resources</li> remote resources</li>
<li>The design is modular, most of the extensions can be compiled out.</li> <li>The design is modular, most of the extensions can be compiled out.</li>
<li>The internal document repesentation is as close as possible to the <a <li>The internal document representation is as close as possible to the <a
href="http://www.w3.org/DOM/">DOM</a> interfaces.</li> href="http://www.w3.org/DOM/">DOM</a> interfaces.</li>
<li>Libxml also has a <a href="http://www.megginson.com/SAX/index.html">SAX <li>Libxml also has a <a href="http://www.megginson.com/SAX/index.html">SAX
like interface</a>; the interface is designed to be compatible with <a like interface</a>; the interface is designed to be compatible with <a
href="http://www.jclark.com/xml/expat.html">Expat</a>.</li> href="http://www.jclark.com/xml/expat.html">Expat</a>.</li>
<li>This library is released under the <a <li>This library is released under the <a
href="http://www.opensource.org/licenses/mit-license.html">MIT href="http://www.opensource.org/licenses/mit-license.html">MIT
Licence</a> see the Copyright file in the distribution for the precise License</a> see the Copyright file in the distribution for the precise
wording.</li> wording.</li>
</ul> </ul>
@ -144,22 +144,22 @@ libxml2</p>
<p>Table of Content:</p> <p>Table of Content:</p>
<ul> <ul>
<li><a href="FAQ.html#Licence">Licence(s)</a></li> <li><a href="FAQ.html#License">License(s)</a></li>
<li><a href="FAQ.html#Installati">Installation</a></li> <li><a href="FAQ.html#Installati">Installation</a></li>
<li><a href="FAQ.html#Compilatio">Compilation</a></li> <li><a href="FAQ.html#Compilatio">Compilation</a></li>
<li><a href="FAQ.html#Developer">Developer corner</a></li> <li><a href="FAQ.html#Developer">Developer corner</a></li>
</ul> </ul>
<h3><a name="Licence">Licence</a>(s)</h3> <h3><a name="License">License</a>(s)</h3>
<ol> <ol>
<li><em>Licensing Terms for libxml</em> <li><em>Licensing Terms for libxml</em>
<p>libxml is released under the <a <p>libxml is released under the <a
href="http://www.opensource.org/licenses/mit-license.html">MIT href="http://www.opensource.org/licenses/mit-license.html">MIT
Licence</a>, see the file Copyright in the distribution for the precise License</a>, see the file Copyright in the distribution for the precise
wording</p> wording</p>
</li> </li>
<li><em>Can I embed libxml in a proprietary application ?</em> <li><em>Can I embed libxml in a proprietary application ?</em>
<p>Yes. The MIT Licence allows you to also keep proprietary the changes <p>Yes. The MIT License allows you to also keep proprietary the changes
you made to libxml, but it would be graceful to provide back bug fixes and you made to libxml, but it would be graceful to provide back bug fixes and
improvements as patches for possible incorporation in the main improvements as patches for possible incorporation in the main
development tree</p> development tree</p>
@ -175,7 +175,7 @@ libxml2</p>
<p>The original distribution comes from <a <p>The original distribution comes from <a
href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> or <a href="ftp://rpmfind.net/pub/libxml/">rpmfind.net</a> or <a
href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">gnome.org</a></p> href="ftp://ftp.gnome.org/pub/GNOME/stable/sources/libxml/">gnome.org</a></p>
<p>Most linux and Bsd distribution includes libxml, this is probably the <p>Most Linux and BSD distributions include libxml, this is probably the
safer way for end-users</p> safer way for end-users</p>
<p>David Doolin provides precompiled Windows versions at <a <p>David Doolin provides precompiled Windows versions at <a
href="http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/ ">http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/</a></p> href="http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/ ">http://www.ce.berkeley.edu/~doolin/code/libxmlwin32/</a></p>
@ -208,8 +208,8 @@ libxml2</p>
libxml.so.0</p> libxml.so.0</p>
</li> </li>
<li><em>I can't install the libxml(2) RPM package due to failed <li><em>I can't install the libxml(2) RPM package due to failed
dependancies</em> dependencies</em>
<p>The most generic solution is to refetch the latest src.rpm , and <p>The most generic solution is to re-fetch the latest src.rpm , and
rebuild it locally with</p> rebuild it locally with</p>
<p><code>rpm --rebuild libxml(2)-xxx.src.rpm</code></p> <p><code>rpm --rebuild libxml(2)-xxx.src.rpm</code></p>
<p>if everything goes well it will generate two binary rpm (one providing <p>if everything goes well it will generate two binary rpm (one providing
@ -244,7 +244,7 @@ libxml2</p>
highly portable and available widely compression library</li> highly portable and available widely compression library</li>
<li>iconv: a powerful character encoding conversion library. It's <li>iconv: a powerful character encoding conversion library. It's
included by default on recent glibc libraries, so it doesn't need to included by default on recent glibc libraries, so it doesn't need to
be installed specifically on linux. It seems it's now <a be installed specifically on Linux. It seems it's now <a
href="http://www.opennc.org/onlinepubs/7908799/xsh/iconv.html">part href="http://www.opennc.org/onlinepubs/7908799/xsh/iconv.html">part
of the official UNIX</a> specification. Here is one <a of the official UNIX</a> specification. Here is one <a
href="http://clisp.cons.org/~haible/packages-libiconv.html">implementation href="http://clisp.cons.org/~haible/packages-libiconv.html">implementation
@ -304,7 +304,7 @@ libxml2</p>
<p><em>I want to the get the content of the first node (node with the <p><em>I want to the get the content of the first node (node with the
CommFlag="0")</em></p> CommFlag="0")</em></p>
<p><em>so I did it as following;</em></p> <p><em>so I did it as following;</em></p>
<pre>xmlNodePtr pode; <pre>xmlNodePtr pnode;
pnode=pxmlDoc-&gt;children-&gt;children;</pre> pnode=pxmlDoc-&gt;children-&gt;children;</pre>
<p><em>but it does not work. If I change it to</em></p> <p><em>but it does not work. If I change it to</em></p>
<pre>pnode=pxmlDoc-&gt;children-&gt;children-&gt;next;</pre> <pre>pnode=pxmlDoc-&gt;children-&gt;children-&gt;next;</pre>
@ -313,7 +313,7 @@ pnode=pxmlDoc-&gt;children-&gt;children;</pre>
<p>In XML all characters in the content of the document are significant <p>In XML all characters in the content of the document are significant
<strong>including blanks and formatting line breaks</strong>.</p> <strong>including blanks and formatting line breaks</strong>.</p>
<p>The extra nodes you are wondering about are just that, text nodes with <p>The extra nodes you are wondering about are just that, text nodes with
the formatting spaces wich are part of the document but that people tend the formatting spaces which are part of the document but that people tend
to forget. There is a function <a to forget. There is a function <a
href="http://xmlsoft.org/html/libxml-parser.html">xmlKeepBlanksDefault href="http://xmlsoft.org/html/libxml-parser.html">xmlKeepBlanksDefault
()</a> to remove those at parse time, but that's an heuristic, and its ()</a> to remove those at parse time, but that's an heuristic, and its
@ -353,7 +353,7 @@ pnode=pxmlDoc-&gt;children-&gt;children;</pre>
<li>check more deeply the <a href="html/libxml-lib.html">existing <li>check more deeply the <a href="html/libxml-lib.html">existing
generated doc</a></li> generated doc</a></li>
<li>looks for examples of use for libxml function using the Gnome code <li>looks for examples of use for libxml function using the Gnome code
for example the following will query the full Gnome CVs base for the for example the following will query the full Gnome CVS base for the
use of the <strong>xmlAddChild()</strong> function: use of the <strong>xmlAddChild()</strong> function:
<p><a <p><a
href="http://cvs.gnome.org/lxr/search?string=xmlAddChild">http://cvs.gnome.org/lxr/search?string=xmlAddChild</a></p> href="http://cvs.gnome.org/lxr/search?string=xmlAddChild">http://cvs.gnome.org/lxr/search?string=xmlAddChild</a></p>
@ -372,7 +372,7 @@ pnode=pxmlDoc-&gt;children-&gt;children;</pre>
<p>libxml is written in pure C in order to allow easy reuse on a number <p>libxml is written in pure C in order to allow easy reuse on a number
of platforms, including embedded systems. I don't intend to convert to of platforms, including embedded systems. I don't intend to convert to
C++.</p> C++.</p>
<p>There is however a few C++ wrappers which may fullfill your needs:</p> <p>There is however a few C++ wrappers which may fulfill your needs:</p>
<ul> <ul>
<li>by Ari Johnson &lt;ari@btigate.com&gt;: <li>by Ari Johnson &lt;ari@btigate.com&gt;:
<p>Website: <a <p>Website: <a
@ -391,7 +391,7 @@ pnode=pxmlDoc-&gt;children-&gt;children;</pre>
initial parsing time or documents who have been built from scratch using initial parsing time or documents who have been built from scratch using
the API. Use the <a the API. Use the <a
href="http://xmlsoft.org/html/libxml-valid.html#XMLVALIDATEDTD">xmlValidateDtd()</a> href="http://xmlsoft.org/html/libxml-valid.html#XMLVALIDATEDTD">xmlValidateDtd()</a>
function. It is also possible to simply add a Dtd to an existing function. It is also possible to simply add a DTD to an existing
document:</p> document:</p>
<pre>xmlDocPtr doc; /* your existing document */ <pre>xmlDocPtr doc; /* your existing document */
xmlDtdPtr dtd = xmlParseDTD(NULL, filename_of_dtd); /* parse the DTD */ xmlDtdPtr dtd = xmlParseDTD(NULL, filename_of_dtd); /* parse the DTD */
@ -461,13 +461,13 @@ posting</span></strong>:</p>
version</a>, and that the problem still shows up in those</li> version</a>, and that the problem still shows up in those</li>
<li>check the <a href="http://mail.gnome.org/archives/xml/">list <li>check the <a href="http://mail.gnome.org/archives/xml/">list
archives</a> to see if the problem was reported already, in this case archives</a> to see if the problem was reported already, in this case
there is probably a fix available, similary check the <a there is probably a fix available, similarly check the <a
href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">registered href="http://bugzilla.gnome.org/buglist.cgi?product=libxml">registered
open bugs</a></li> open bugs</a></li>
<li>make sure you can reproduce the bug with xmllint or one of the test <li>make sure you can reproduce the bug with xmllint or one of the test
programs found in source in the distribution</li> programs found in source in the distribution</li>
<li>Please send the command showing the error as well as the input (as an <li>Please send the command showing the error as well as the input (as an
attachement)</li> attachment)</li>
</ul> </ul>
<p>Then send the bug with associated informations to reproduce it to the <a <p>Then send the bug with associated informations to reproduce it to the <a
@ -483,8 +483,8 @@ probably be processed faster.</p>
href="http://mail.gnome.org/archives/xml/">the list archive</a> may actually href="http://mail.gnome.org/archives/xml/">the list archive</a> may actually
provide the answer, I usually send source samples when answering libxml usage provide the answer, I usually send source samples when answering libxml usage
questions. The <a href="http://xmlsoft.org/html/book1.html">auto-generated questions. The <a href="http://xmlsoft.org/html/book1.html">auto-generated
documentantion</a> is not as polished as I would like (i need to learn more documentation</a> is not as polished as I would like (i need to learn more
about Docbook), but it's a good starting point.</p> about DocBook), but it's a good starting point.</p>
<h2><a name="help">How to help</a></h2> <h2><a name="help">How to help</a></h2>
@ -589,7 +589,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.4.20: Apr 15 2002</h3> <h3>2.4.20: Apr 15 2002</h3>
<ul> <ul>
<li>bug fixes: file descriptor leak, XPath, HTML ouput, DTD validation</li> <li>bug fixes: file descriptor leak, XPath, HTML output, DTD validation</li>
<li>XPath conformance testing by Richard Jinks</li> <li>XPath conformance testing by Richard Jinks</li>
<li>Portability fixes: Solaris, MPE/iX, Windows, OSF/1, python bindings, <li>Portability fixes: Solaris, MPE/iX, Windows, OSF/1, python bindings,
libxml.m4</li> libxml.m4</li>
@ -607,7 +607,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.4.18: Mar 18 2002</h3> <h3>2.4.18: Mar 18 2002</h3>
<ul> <ul>
<li>bug fixes: tree, SAX, canonicalization, validation, portability, <li>bug fixes: tree, SAX, canonicalization, validation, portability,
xpath</li> XPath</li>
<li>removed the --with-buffer option it was becoming unmaintainable</li> <li>removed the --with-buffer option it was becoming unmaintainable</li>
<li>serious cleanup of the Python makefiles</li> <li>serious cleanup of the Python makefiles</li>
<li>speedup patch to XPath very effective for DocBook stylesheets</li> <li>speedup patch to XPath very effective for DocBook stylesheets</li>
@ -620,7 +620,7 @@ it's actually not compiled in by default. The real fixes are:</p>
XPath"</li> XPath"</li>
<li>fixed/improved the Python wrappers, added more examples and more <li>fixed/improved the Python wrappers, added more examples and more
regression tests, XPath extension functions can now return node-sets</li> regression tests, XPath extension functions can now return node-sets</li>
<li>added the XML Canonalization support from Aleksey Sanin</li> <li>added the XML Canonicalization support from Aleksey Sanin</li>
</ul> </ul>
<h3>2.4.16: Feb 20 2002</h3> <h3>2.4.16: Feb 20 2002</h3>
@ -639,10 +639,10 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.4.14: Feb 8 2002</h3> <h3>2.4.14: Feb 8 2002</h3>
<ul> <ul>
<li>Change of Licence to the <a <li>Change of License to the <a
href="http://www.opensource.org/licenses/mit-license.html">MIT href="http://www.opensource.org/licenses/mit-license.html">MIT
Licence</a> basisally for integration in XFree86 codebase, and removing License</a> basically for integration in XFree86 codebase, and removing
confusion around the previous dual-licencing</li> confusion around the previous dual-licensing</li>
<li>added Python bindings, beta software but should already be quite <li>added Python bindings, beta software but should already be quite
complete</li> complete</li>
<li>a large number of fixes and cleanups, especially for all tree <li>a large number of fixes and cleanups, especially for all tree
@ -725,7 +725,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>portability and configure fixes</li> <li>portability and configure fixes</li>
<li>an infinite loop on the HTML parser was removed (William)</li> <li>an infinite loop on the HTML parser was removed (William)</li>
<li>Windows makefile patches from Igor</li> <li>Windows makefile patches from Igor</li>
<li>fixed half a dozen bugs reported fof libxml or libxslt</li> <li>fixed half a dozen bugs reported for libxml or libxslt</li>
<li>updated xmlcatalog to be able to modify SGML super catalogs</li> <li>updated xmlcatalog to be able to modify SGML super catalogs</li>
</ul> </ul>
@ -761,7 +761,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<ul> <ul>
<li>adds xmlLineNumbersDefault() to control line number generation</li> <li>adds xmlLineNumbersDefault() to control line number generation</li>
<li>lot of bug fixes</li> <li>lot of bug fixes</li>
<li>the Microsoft MSC projects files shuld now be up to date</li> <li>the Microsoft MSC projects files should now be up to date</li>
<li>inheritance of namespaces from DTD defaulted attributes</li> <li>inheritance of namespaces from DTD defaulted attributes</li>
<li>fixes a serious potential security bug</li> <li>fixes a serious potential security bug</li>
<li>added a --format option to xmllint</li> <li>added a --format option to xmllint</li>
@ -779,21 +779,21 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.4.0: July 10 2001</h3> <h3>2.4.0: July 10 2001</h3>
<ul> <ul>
<li>Fixed a few bugs in XPath, validation, and tree handling.</li> <li>Fixed a few bugs in XPath, validation, and tree handling.</li>
<li>Fixed XML Base implementation, added a coupel of examples to the <li>Fixed XML Base implementation, added a couple of examples to the
regression tests</li> regression tests</li>
<li>A bit of cleanup</li> <li>A bit of cleanup</li>
</ul> </ul>
<h3>2.3.14: July 5 2001</h3> <h3>2.3.14: July 5 2001</h3>
<ul> <ul>
<li>fixed some entities problems and reduce mem requirement when <li>fixed some entities problems and reduce memory requirement when
substituing them</li> substituting them</li>
<li>lots of improvements in the XPath queries interpreter can be <li>lots of improvements in the XPath queries interpreter can be
substancially faster</li> substantially faster</li>
<li>Makefiles and configure cleanups</li> <li>Makefiles and configure cleanups</li>
<li>Fixes to XPath variable eval, and compare on empty node set</li> <li>Fixes to XPath variable eval, and compare on empty node set</li>
<li>HTML tag closing bug fixed</li> <li>HTML tag closing bug fixed</li>
<li>Fixed an URI reference computating problem when validating</li> <li>Fixed an URI reference computation problem when validating</li>
</ul> </ul>
<h3>2.3.13: June 28 2001</h3> <h3>2.3.13: June 28 2001</h3>
@ -854,9 +854,9 @@ it's actually not compiled in by default. The real fixes are:</p>
<p>Lots of bugfixes, and added a basic SGML catalog support:</p> <p>Lots of bugfixes, and added a basic SGML catalog support:</p>
<ul> <ul>
<li>HTML push bugfix #54891 and another patch from Jonas Borgstr<74>m</li> <li>HTML push bugfix #54891 and another patch from Jonas Borgstr<74>m</li>
<li>some serious speed optimisation again</li> <li>some serious speed optimization again</li>
<li>some documentation cleanups</li> <li>some documentation cleanups</li>
<li>trying to get better linking on solaris (-R)</li> <li>trying to get better linking on Solaris (-R)</li>
<li>XPath API cleanup from Thomas Broyer</li> <li>XPath API cleanup from Thomas Broyer</li>
<li>Validation bug fixed #54631, added a patch from Gary Pennington, fixed <li>Validation bug fixed #54631, added a patch from Gary Pennington, fixed
xmlValidGetValidElements()</li> xmlValidGetValidElements()</li>
@ -891,12 +891,12 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.3.7: April 22 2001</h3> <h3>2.3.7: April 22 2001</h3>
<ul> <ul>
<li>lots of small bug fixes, corrected XPointer</li> <li>lots of small bug fixes, corrected XPointer</li>
<li>Non determinist content model validation support</li> <li>Non deterministic content model validation support</li>
<li>added xmlDocCopyNode for gdome2</li> <li>added xmlDocCopyNode for gdome2</li>
<li>revamped the way the HTML parser handles end of tags</li> <li>revamped the way the HTML parser handles end of tags</li>
<li>XPath: corrctions of namespacessupport and number formatting</li> <li>XPath: corrections of namespaces support and number formatting</li>
<li>Windows: Igor Zlatkovic patches for MSC compilation</li> <li>Windows: Igor Zlatkovic patches for MSC compilation</li>
<li>HTML ouput fixes from P C Chow and William M. Brack</li> <li>HTML output fixes from P C Chow and William M. Brack</li>
<li>Improved validation speed sensible for DocBook</li> <li>Improved validation speed sensible for DocBook</li>
<li>fixed a big bug with ID declared in external parsed entities</li> <li>fixed a big bug with ID declared in external parsed entities</li>
<li>portability fixes, update of Trio from Bjorn Reese</li> <li>portability fixes, update of Trio from Bjorn Reese</li>
@ -937,7 +937,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>Bjorn fixed XPath node collection and Number formatting</li> <li>Bjorn fixed XPath node collection and Number formatting</li>
<li>Fixed a loop reported in the HTML parsing</li> <li>Fixed a loop reported in the HTML parsing</li>
<li>blank space are reported even if the Dtd content model proves that they <li>blank space are reported even if the Dtd content model proves that they
are formatting spaces, this is for XmL conformance</li> are formatting spaces, this is for XML conformance</li>
</ul> </ul>
<h3>2.3.3: Mar 1 2001</h3> <h3>2.3.3: Mar 1 2001</h3>
@ -979,7 +979,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>added HTML to the RPM packages</li> <li>added HTML to the RPM packages</li>
<li>tree copying bugfixes</li> <li>tree copying bugfixes</li>
<li>updates to Windows makefiles</li> <li>updates to Windows makefiles</li>
<li>optimisation patch from Bjorn Reese</li> <li>optimization patch from Bjorn Reese</li>
</ul> </ul>
<h3>2.2.11: Jan 4 2001</h3> <h3>2.2.11: Jan 4 2001</h3>
@ -1063,7 +1063,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>cleanup of entity handling code</li> <li>cleanup of entity handling code</li>
<li>overall review of all loops in the parsers, all sprintf usage has been <li>overall review of all loops in the parsers, all sprintf usage has been
checked too</li> checked too</li>
<li>Far better handling of larges Dtd. Validating against Docbook XML Dtd <li>Far better handling of larges Dtd. Validating against DocBook XML Dtd
works smoothly now.</li> works smoothly now.</li>
</ul> </ul>
@ -1116,7 +1116,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.1.0 and 1.8.8: June 29 2000</h3> <h3>2.1.0 and 1.8.8: June 29 2000</h3>
<ul> <ul>
<li>1.8.8 is mostly a comodity package for upgrading to libxml2 accoding to <li>1.8.8 is mostly a commodity package for upgrading to libxml2 according to
<a href="upgrade.html">new instructions</a>. It fixes a nasty problem <a href="upgrade.html">new instructions</a>. It fixes a nasty problem
about &amp;#38; charref parsing</li> about &amp;#38; charref parsing</li>
<li>2.1.0 also ease the upgrade from libxml v1 to the recent version. it <li>2.1.0 also ease the upgrade from libxml v1 to the recent version. it
@ -1125,7 +1125,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<li>added xmlStopParser() to stop parsing</li> <li>added xmlStopParser() to stop parsing</li>
<li>improved a lot parsing speed when there is large CDATA blocs</li> <li>improved a lot parsing speed when there is large CDATA blocs</li>
<li>includes XPath patches provided by Picdar Technology</li> <li>includes XPath patches provided by Picdar Technology</li>
<li>tried to fix as much as possible DtD validation and namespace <li>tried to fix as much as possible DTD validation and namespace
related problems</li> related problems</li>
<li>output to a given encoding has been added/tested</li> <li>output to a given encoding has been added/tested</li>
<li>lot of various fixes</li> <li>lot of various fixes</li>
@ -1136,8 +1136,8 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>2.0.0: Apr 12 2000</h3> <h3>2.0.0: Apr 12 2000</h3>
<ul> <ul>
<li>First public release of libxml2. If you are using libxml, it's a good <li>First public release of libxml2. If you are using libxml, it's a good
idea to check the 1.x to 2.x upgrade instructions. NOTE: while initally idea to check the 1.x to 2.x upgrade instructions. NOTE: while initially
scheduled for Apr 3 the relase occured only on Apr 12 due to massive scheduled for Apr 3 the release occurred only on Apr 12 due to massive
workload.</li> workload.</li>
<li>The include are now located under $prefix/include/libxml (instead of <li>The include are now located under $prefix/include/libxml (instead of
$prefix/include/gnome-xml), they also are referenced by $prefix/include/gnome-xml), they also are referenced by
@ -1177,17 +1177,17 @@ it's actually not compiled in by default. The real fixes are:</p>
<ul> <ul>
<li>fix I18N support. ISO-Latin-x/UTF-8/UTF-16 (nearly) seems correctly <li>fix I18N support. ISO-Latin-x/UTF-8/UTF-16 (nearly) seems correctly
handled now</li> handled now</li>
<li>Better handling of entities, especially well formedness checking <li>Better handling of entities, especially well-formedness checking
and proper PEref extensions in external subsets</li> and proper PEref extensions in external subsets</li>
<li>DTD conditional sections</li> <li>DTD conditional sections</li>
<li>Validation now correcly handle entities content</li> <li>Validation now correctly handle entities content</li>
<li><a href="http://rpmfind.net/tools/gdome/messages/0039.html">change <li><a href="http://rpmfind.net/tools/gdome/messages/0039.html">change
structures to accomodate DOM</a></li> structures to accommodate DOM</a></li>
</ul> </ul>
</li> </li>
<li>Serious progress were made toward compliance, <a <li>Serious progress were made toward compliance, <a
href="conf/result.html">here are the result of the test</a> against the href="conf/result.html">here are the result of the test</a> against the
OASIS testsuite (except the japanese tests since I don't support that OASIS testsuite (except the Japanese tests since I don't support that
encoding yet). This URL is rebuilt every couple of hours using the CVS encoding yet). This URL is rebuilt every couple of hours using the CVS
head version.</li> head version.</li>
</ul> </ul>
@ -1239,7 +1239,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<ul> <ul>
<li>a Push interface for the XML and HTML parsers</li> <li>a Push interface for the XML and HTML parsers</li>
<li>a shell-like interface to the document tree (try tester --shell :-)</li> <li>a shell-like interface to the document tree (try tester --shell :-)</li>
<li>lots of bug fixes and improvement added over XMas hollidays</li> <li>lots of bug fixes and improvement added over XMas holidays</li>
<li>fixed the DTD parsing code to work with the xhtml DTD</li> <li>fixed the DTD parsing code to work with the xhtml DTD</li>
<li>added xmlRemoveProp(), xmlRemoveID() and xmlRemoveRef()</li> <li>added xmlRemoveProp(), xmlRemoveID() and xmlRemoveRef()</li>
<li>Fixed bugs in xmlNewNs()</li> <li>Fixed bugs in xmlNewNs()</li>
@ -1280,8 +1280,8 @@ it's actually not compiled in by default. The real fixes are:</p>
dataset from <a href="mailto:cnygard@bellatlantic.net">Carl Nygard</a>, dataset from <a href="mailto:cnygard@bellatlantic.net">Carl Nygard</a>,
configure with --with-buffers to enable them.</li> configure with --with-buffers to enable them.</li>
<li>attribute normalization, oops should have been added long ago !</li> <li>attribute normalization, oops should have been added long ago !</li>
<li>attributes defaulted from Dtds should be available, xmlSetProp() now <li>attributes defaulted from DTDs should be available, xmlSetProp() now
does entities escapting by default.</li> does entities escaping by default.</li>
</ul> </ul>
<h3>1.7.4: Oct 25 1999</h3> <h3>1.7.4: Oct 25 1999</h3>
@ -1295,7 +1295,7 @@ it's actually not compiled in by default. The real fixes are:</p>
<h3>1.7.3: Sep 29 1999</h3> <h3>1.7.3: Sep 29 1999</h3>
<ul> <ul>
<li>portability problems fixed</li> <li>portability problems fixed</li>
<li>snprintf was used unconditionnally, leading to link problems on system <li>snprintf was used unconditionally, leading to link problems on system
were it's not available, fixed</li> were it's not available, fixed</li>
</ul> </ul>
@ -1310,7 +1310,7 @@ it's actually not compiled in by default. The real fixes are:</p>
leading to troubles on platforms where it's a macro</li> leading to troubles on platforms where it's a macro</li>
</ul> </ul>
<h3>1.7.0: sep 23 1999</h3> <h3>1.7.0: Sep 23 1999</h3>
<ul> <ul>
<li>Added the ability to fetch remote DTD or parsed entities, see the <a <li>Added the ability to fetch remote DTD or parsed entities, see the <a
href="html/libxml-nanohttp.html">nanohttp</a> module.</li> href="html/libxml-nanohttp.html">nanohttp</a> module.</li>
@ -1351,7 +1351,7 @@ it ends with <code>/&gt;</code> rather than with <code>&gt;</code>. Note
that, for example, the image tag has no content (just an attribute) and is that, for example, the image tag has no content (just an attribute) and is
closed by ending the tag with <code>/&gt;</code>.</p> closed by ending the tag with <code>/&gt;</code>.</p>
<p>XML can be applied sucessfully to a wide range of uses, from long term <p>XML can be applied successfully to a wide range of uses, from long term
structured document maintenance (where it follows the steps of SGML) to structured document maintenance (where it follows the steps of SGML) to
simple data encoding mechanisms like configuration file formatting (glade), simple data encoding mechanisms like configuration file formatting (glade),
spreadsheets (gnumeric), or even shorter lived documents such as WebDAV where spreadsheets (gnumeric), or even shorter lived documents such as WebDAV where
@ -1397,8 +1397,8 @@ or libxslt wrappers or bindings:</p>
</li> </li>
<li><a <li><a
href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt
Sergeant</a> developped <a Sergeant</a> developed <a
href="http://axkit.org/download/">XML::LibXSLT</a>, a perl wrapper for href="http://axkit.org/download/">XML::LibXSLT</a>, a Perl wrapper for
libxml2/libxslt as part of the <a href="http://axkit.com/">AxKit XML libxml2/libxslt as part of the <a href="http://axkit.com/">AxKit XML
application server</a></li> application server</a></li>
<li><a href="mailto:dkuhlman@cutter.rexx.com">Dave Kuhlman</a> provides and <li><a href="mailto:dkuhlman@cutter.rexx.com">Dave Kuhlman</a> provides and
@ -1421,7 +1421,7 @@ or libxslt wrappers or bindings:</p>
<li>There is support for libxml2 in the DOM module of PHP.</li> <li>There is support for libxml2 in the DOM module of PHP.</li>
</ul> </ul>
<p>The distribution includes a set of Python bindings, which are garanteed to <p>The distribution includes a set of Python bindings, which are guaranteed to
be maintained as part of the library in the future, though the Python be maintained as part of the library in the future, though the Python
interface have not yet reached the maturity of the C API.</p> interface have not yet reached the maturity of the C API.</p>
@ -1465,11 +1465,11 @@ doc.freeDoc()</pre>
<p>The Python module is called libxml2, parseFile is the equivalent of <p>The Python module is called libxml2, parseFile is the equivalent of
xmlParseFile (most of the bindings are automatically generated, and the xml xmlParseFile (most of the bindings are automatically generated, and the xml
prefix is removed and the casing convention are kept). All node seen at the prefix is removed and the casing convention are kept). All node seen at the
binding level share the same subset of accesors:</p> binding level share the same subset of accessors:</p>
<ul> <ul>
<li><code>name</code> : returns the node name</li> <li><code>name</code> : returns the node name</li>
<li><code>type</code> : returns a string indicating the node <li><code>type</code> : returns a string indicating the node
typ<code>e</code></li> type</li>
<li><code>content</code> : returns the content of the node, it is based on <li><code>content</code> : returns the content of the node, it is based on
xmlNodeGetContent() and hence is recursive.</li> xmlNodeGetContent() and hence is recursive.</li>
<li><code>parent</code> , <code>children</code>, <code>last</code>, <li><code>parent</code> , <code>children</code>, <code>last</code>,
@ -1478,7 +1478,7 @@ binding level share the same subset of accesors:</p>
those may return None in case no such link exists.</li> those may return None in case no such link exists.</li>
</ul> </ul>
<p>Also note the need to explicitely deallocate documents with freeDoc() . <p>Also note the need to explicitly deallocate documents with freeDoc() .
Reference counting for libxml2 trees would need quite a lot of work to Reference counting for libxml2 trees would need quite a lot of work to
function properly, and rather than risk memory leaks if not implemented function properly, and rather than risk memory leaks if not implemented
correctly it sounds safer to have an explicit function to free a tree. The correctly it sounds safer to have an explicit function to free a tree. The
@ -1491,7 +1491,7 @@ collected.</p>
messages:</p> messages:</p>
<pre>import libxml2 <pre>import libxml2
#desactivate error messages from the validation #deactivate error messages from the validation
def noerr(ctx, str): def noerr(ctx, str):
pass pass
@ -1504,7 +1504,7 @@ doc = ctxt.doc()
valid = ctxt.isValid() valid = ctxt.isValid()
doc.freeDoc() doc.freeDoc()
if valid != 0: if valid != 0:
print "validity chec failed"</pre> print "validity check failed"</pre>
<p>The first thing to notice is the call to registerErrorHandler(), it <p>The first thing to notice is the call to registerErrorHandler(), it
defines a new error handler global to the library. It is used to avoid seeing defines a new error handler global to the library. It is used to avoid seeing
@ -1512,7 +1512,7 @@ the error messages when trying to validate the invalid document.</p>
<p>The main interest of that test is the creation of a parser context with <p>The main interest of that test is the creation of a parser context with
createFileParserCtxt() and how the behaviour can be changed before calling createFileParserCtxt() and how the behaviour can be changed before calling
parseDocument() . Similary the informations resulting from the parsing phase parseDocument() . Similarly the informations resulting from the parsing phase
are also available using context methods.</p> are also available using context methods.</p>
<p>Contexts like nodes are defined as class and the libxml2 wrappers maps the <p>Contexts like nodes are defined as class and the libxml2 wrappers maps the
@ -1531,13 +1531,13 @@ doc = ctxt.doc()
doc.freeDoc()</pre> doc.freeDoc()</pre>
<p>The context is created with a speciall call based on the <p>The context is created with a special call based on the
xmlCreatePushParser() from the C library. The first argument is an optional xmlCreatePushParser() from the C library. The first argument is an optional
SAX callback object, then the initial set of data, the lenght and the name of SAX callback object, then the initial set of data, the length and the name of
the resource in case URI-References need to be computed by the parser.</p> the resource in case URI-References need to be computed by the parser.</p>
<p>Then the data are pushed using the parseChunk() method, the last call <p>Then the data are pushed using the parseChunk() method, the last call
setting the thrird argument terminate to 1.</p> setting the third argument terminate to 1.</p>
<h3>pushSAX.py:</h3> <h3>pushSAX.py:</h3>
@ -1592,7 +1592,7 @@ reference = "startDocument:startElement foo {'url': 'tst'}:" + \
"characters: bar:endElement foo:endDocument:" "characters: bar:endElement foo:endDocument:"
if log != reference: if log != reference:
print "Error got: %s" % log print "Error got: %s" % log
print "Exprected: %s" % reference</pre> print "Expected: %s" % reference</pre>
<p>The key object in that test is the handler, it provides a number of entry <p>The key object in that test is the handler, it provides a number of entry
points which can be called by the parser as it makes progresses to indicate points which can be called by the parser as it makes progresses to indicate
@ -1600,7 +1600,7 @@ the information set obtained. The full set of callback is larger than what
the callback class in that specific example implements (see the SAX the callback class in that specific example implements (see the SAX
definition for a complete list). The wrapper will only call those supplied by definition for a complete list). The wrapper will only call those supplied by
the object when activated. The startElement receives the names of the element the object when activated. The startElement receives the names of the element
and a dictionnary containing the attributes carried by this element.</p> and a dictionary containing the attributes carried by this element.</p>
<p>Also note that the reference string generated from the callback shows a <p>Also note that the reference string generated from the callback shows a
single character call even though the string "bar" is passed to the parser single character call even though the string "bar" is passed to the parser
@ -1608,7 +1608,7 @@ from 2 different call to parseChunk()</p>
<h3>xpath.py:</h3> <h3>xpath.py:</h3>
<p>This is a basic test of XPath warppers support</p> <p>This is a basic test of XPath wrappers support</p>
<pre>import libxml2 <pre>import libxml2
doc = libxml2.parseFile("tst.xml") doc = libxml2.parseFile("tst.xml")
@ -1627,7 +1627,7 @@ ctxt.xpathFreeContext()</pre>
expression on it. The xpathEval() method execute an XPath query and returns expression on it. The xpathEval() method execute an XPath query and returns
the result mapped in a Python way. String and numbers are natively converted, the result mapped in a Python way. String and numbers are natively converted,
and node sets are returned as a tuple of libxml2 Python nodes wrappers. Like and node sets are returned as a tuple of libxml2 Python nodes wrappers. Like
the document, the XPath context need to be freed explicitely, also not that the document, the XPath context need to be freed explicitly, also not that
the result of the XPath query may point back to the document tree and hence the result of the XPath query may point back to the document tree and hence
the document must be freed after the result of the query is used.</p> the document must be freed after the result of the query is used.</p>
@ -1650,7 +1650,7 @@ doc.freeDoc()
ctxt.xpathFreeContext()</pre> ctxt.xpathFreeContext()</pre>
<p>Note how the extension function is registered with the context (but that <p>Note how the extension function is registered with the context (but that
part is not yet finalized, ths may change slightly in the future).</p> part is not yet finalized, this may change slightly in the future).</p>
<h3>tstxpath.py:</h3> <h3>tstxpath.py:</h3>
@ -1687,7 +1687,7 @@ else:
libxml2.dumpMemory()</pre> libxml2.dumpMemory()</pre>
<p>Those activate the memory debugging interface of libxml2 where all <p>Those activate the memory debugging interface of libxml2 where all
alloacted block in the library are tracked. The prologue then cleans up the allocated block in the library are tracked. The prologue then cleans up the
library state and checks that all allocated memory has been freed. If not it library state and checks that all allocated memory has been freed. If not it
calls dumpMemory() which saves that list in a <code>.memdump</code> file.</p> calls dumpMemory() which saves that list in a <code>.memdump</code> file.</p>
@ -1856,8 +1856,8 @@ interface.</p>
<p>Well what is validation and what is a DTD ?</p> <p>Well what is validation and what is a DTD ?</p>
<p>DTD is the acronym for Document Type Definition. This is a description of <p>DTD is the acronym for Document Type Definition. This is a description of
the content for a familly of XML files. This is part of the XML 1.0 the content for a family of XML files. This is part of the XML 1.0
specification, and alows to describe and check that a given document instance specification, and allows to describe and check that a given document instance
conforms to a set of rules detailing its structure and content.</p> conforms to a set of rules detailing its structure and content.</p>
<p>Validation is the process of checking a document against a DTD (more <p>Validation is the process of checking a document against a DTD (more
@ -1890,10 +1890,10 @@ ancient...</p>
<p>Writing DTD can be done in multiple ways, the rules to build them if you <p>Writing DTD can be done in multiple ways, the rules to build them if you
need something fixed or something which can evolve over time can be radically need something fixed or something which can evolve over time can be radically
different. Really complex DTD like Docbook ones are flexible but quite harder different. Really complex DTD like DocBook ones are flexible but quite harder
to design. I will just focuse on DTDs for a formats with a fixed simple to design. I will just focus on DTDs for a formats with a fixed simple
structure. It is just a set of basic rules, and definitely not exhaustive nor structure. It is just a set of basic rules, and definitely not exhaustive nor
useable for complex DTD design.</p> usable for complex DTD design.</p>
<h4><a name="reference1">How to reference a DTD from a document</a>:</h4> <h4><a name="reference1">How to reference a DTD from a document</a>:</h4>
@ -1910,10 +1910,10 @@ is placed in the file <code>mydtd</code> in the subdirectory
full URL string indicating the location of your DTD on the Web, this is a full URL string indicating the location of your DTD on the Web, this is a
really good thing to do if you want others to validate your document</li> really good thing to do if you want others to validate your document</li>
<li>it is also possible to associate a <code>PUBLIC</code> identifier (a <li>it is also possible to associate a <code>PUBLIC</code> identifier (a
magic string) so that the DTd is looked up in catalogs on the client side magic string) so that the DTD is looked up in catalogs on the client side
without having to locate it on the web</li> without having to locate it on the web</li>
<li>a dtd contains a set of elements and attributes declarations, but they <li>a dtd contains a set of elements and attributes declarations, but they
don't define what the root of the document should be. This is explicitely don't define what the root of the document should be. This is explicitly
told to the parser/validator as the first element of the told to the parser/validator as the first element of the
<code>DOCTYPE</code> declaration.</li> <code>DOCTYPE</code> declaration.</li>
</ul> </ul>
@ -1925,9 +1925,9 @@ is placed in the file <code>mydtd</code> in the subdirectory
<p><code>&lt;!ELEMENT spec (front, body, back?)&gt;</code></p> <p><code>&lt;!ELEMENT spec (front, body, back?)&gt;</code></p>
<p>it also expresses that the spec element contains one <code>front</code>, <p>it also expresses that the spec element contains one <code>front</code>,
one <code>body</code> and one optionnal <code>back</code> children elements one <code>body</code> and one optional <code>back</code> children elements
in this order. The declaration of one element of the structure and its in this order. The declaration of one element of the structure and its
content are done in a single declaration. Similary the following declares content are done in a single declaration. Similarly the following declares
<code>div1</code> elements:</p> <code>div1</code> elements:</p>
<p><code>&lt;!ELEMENT div1 (head, (p | list | note)*, div2?)&gt;</code></p> <p><code>&lt;!ELEMENT div1 (head, (p | list | note)*, div2?)&gt;</code></p>
@ -1955,7 +1955,7 @@ order.</p>
<p><code>&lt;!ATTLIST termdef name CDATA #IMPLIED&gt;</code></p> <p><code>&lt;!ATTLIST termdef name CDATA #IMPLIED&gt;</code></p>
<p>means that the element <code>termdef</code> can have a <code>name</code> <p>means that the element <code>termdef</code> can have a <code>name</code>
attribute containing text (<code>CDATA</code>) and which is optionnal attribute containing text (<code>CDATA</code>) and which is optional
(<code>#IMPLIED</code>). The attribute value can also be defined within a (<code>#IMPLIED</code>). The attribute value can also be defined within a
set:</p> set:</p>
@ -1964,7 +1964,7 @@ set:</p>
<p>means <code>list</code> element have a <code>type</code> attribute with 3 <p>means <code>list</code> element have a <code>type</code> attribute with 3
allowed values "bullets", "ordered" or "glossary" and which default to allowed values "bullets", "ordered" or "glossary" and which default to
"ordered" if the attribute is not explicitely specified.</p> "ordered" if the attribute is not explicitly specified.</p>
<p>The content type of an attribute can be text (<code>CDATA</code>), <p>The content type of an attribute can be text (<code>CDATA</code>),
anchor/reference/references anchor/reference/references
@ -2004,7 +2004,7 @@ the document.</p>
<h3><a name="validate1">How to validate</a></h3> <h3><a name="validate1">How to validate</a></h3>
<p>The simplest is to use the xmllint program comming with libxml. The <p>The simplest is to use the xmllint program coming with libxml. The
<code>--valid</code> option turn on validation of the files given as input, <code>--valid</code> option turn on validation of the files given as input,
for example the following validates a copy of the first revision of the XML for example the following validates a copy of the first revision of the XML
1.0 specification:</p> 1.0 specification:</p>
@ -2078,7 +2078,7 @@ compatibles).</p>
<h3><a name="cleanup">Cleaning up after parsing</a></h3> <h3><a name="cleanup">Cleaning up after parsing</a></h3>
<p>Libxml is not stateless, there is a few set of memory structures needing <p>Libxml is not stateless, there is a few set of memory structures needing
allocation before the parser is fully functionnal (some encoding structures allocation before the parser is fully functional (some encoding structures
for example). This also mean that once parsing is finished there is a tiny for example). This also mean that once parsing is finished there is a tiny
amount of memory (a few hundred bytes) which can be recollected if you don't amount of memory (a few hundred bytes) which can be recollected if you don't
reuse the parser immediately:</p> reuse the parser immediately:</p>
@ -2117,7 +2117,7 @@ or call a specific routine when a given block number is allocated:</p>
in the <code>.memdump</code> file</li> in the <code>.memdump</code> file</li>
</ul> </ul>
<p>When developping libxml memory debug is enabled, the tests programs call <p>When developing libxml memory debug is enabled, the tests programs call
xmlMemoryDump () and the "make test" regression tests will check for any xmlMemoryDump () and the "make test" regression tests will check for any
memory leak during the full regression test sequence, this helps a lot memory leak during the full regression test sequence, this helps a lot
ensuring that libxml does not leak memory and bullet proof memory ensuring that libxml does not leak memory and bullet proof memory
@ -2127,11 +2127,11 @@ resulting in major portability problems!).</p>
<p>If the .memdump reports a leak, it displays the allocation function and <p>If the .memdump reports a leak, it displays the allocation function and
also tries to give some informations about the content and structure of the also tries to give some informations about the content and structure of the
allocated blocks left. This is sufficient in most cases to find the culprit, allocated blocks left. This is sufficient in most cases to find the culprit,
but not always. Assuming the allocation problem is reproductible, it is but not always. Assuming the allocation problem is reproducible, it is
possible to find more easilly:</p> possible to find more easily:</p>
<ol> <ol>
<li>write down the block number xxxx not allocated</li> <li>write down the block number xxxx not allocated</li>
<li>export the environement variable XML_MEM_BREAKPOINT=xxxx , the easiest <li>export the environment variable XML_MEM_BREAKPOINT=xxxx , the easiest
when using GDB is to simply give the command when using GDB is to simply give the command
<p><code>set environment XML_MEM_BREAKPOINT xxxx</code></p> <p><code>set environment XML_MEM_BREAKPOINT xxxx</code></p>
<p>before running the program.</p> <p>before running the program.</p>
@ -2157,15 +2157,15 @@ spot memory usage errors in a very precise way.</p>
<p>How much libxml memory require ? It's hard to tell in average it depends <p>How much libxml memory require ? It's hard to tell in average it depends
of a number of things:</p> of a number of things:</p>
<ul> <ul>
<li>the parser itself should work in a fixed amout of memory, except for <li>the parser itself should work in a fixed amount of memory, except for
information maintained about the stacks of names and entities locations. information maintained about the stacks of names and entities locations.
The I/O and encoding handlers will probably account for a few KBytes. The I/O and encoding handlers will probably account for a few KBytes.
This is true for both the XML and HTML parser (though the HTML parser This is true for both the XML and HTML parser (though the HTML parser
need more state).</li> need more state).</li>
<li>If you are generating the DOM tree then memory requirements will grow <li>If you are generating the DOM tree then memory requirements will grow
nearly lineary with the size of the data. In general for a balanced nearly linear with the size of the data. In general for a balanced
textual document the internal memory requirement is about 4 times the textual document the internal memory requirement is about 4 times the
size of the UTF8 serialization of this document (exmple the XML-1.0 size of the UTF8 serialization of this document (example the XML-1.0
recommendation is a bit more of 150KBytes and takes 650KBytes of main recommendation is a bit more of 150KBytes and takes 650KBytes of main
memory when parsed). Validation will add a amount of memory required for memory when parsed). Validation will add a amount of memory required for
maintaining the external Dtd state which should be linear with the maintaining the external Dtd state which should be linear with the
@ -2196,19 +2196,19 @@ of a number of things:</p>
<p>XML was designed from the start to allow the support of any character set <p>XML was designed from the start to allow the support of any character set
by using Unicode. Any conformant XML parser has to support the UTF-8 and by using Unicode. Any conformant XML parser has to support the UTF-8 and
UTF-16 default encodings which can both express the full unicode ranges. UTF8 UTF-16 default encodings which can both express the full unicode ranges. UTF8
is a variable length encoding whose greatest point are to resuse the same is a variable length encoding whose greatest points are to reuse the same
emcoding for ASCII and to save space for Western encodings, but it is a bit encoding for ASCII and to save space for Western encodings, but it is a bit
more complex to handle in practice. UTF-16 use 2 bytes per characters (and more complex to handle in practice. UTF-16 use 2 bytes per characters (and
sometimes combines two pairs), it makes implementation easier, but looks a sometimes combines two pairs), it makes implementation easier, but looks a
bit overkill for Western languages encoding. Moreover the XML specification bit overkill for Western languages encoding. Moreover the XML specification
allows document to be encoded in other encodings at the condition that they allows document to be encoded in other encodings at the condition that they
are clearly labelled as such. For example the following is a wellformed XML are clearly labeled as such. For example the following is a wellformed XML
document encoded in ISO-8859 1 and using accentuated letter that we French document encoded in ISO-8859 1 and using accentuated letter that we French
likes for both markup and content:</p> likes for both markup and content:</p>
<pre>&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt; <pre>&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt;
&lt;tr<EFBFBD>s&gt;l<EFBFBD>&lt;/tr<74>s&gt;</pre> &lt;tr<EFBFBD>s&gt;l<EFBFBD>&lt;/tr<74>s&gt;</pre>
<p>Having internationalization support in libxml means the foolowing:</p> <p>Having internationalization support in libxml means the following:</p>
<ul> <ul>
<li>the document is properly parsed</li> <li>the document is properly parsed</li>
<li>informations about it's encoding are saved</li> <li>informations about it's encoding are saved</li>
@ -2223,7 +2223,7 @@ exception of a few routines to read with a specific encoding or save to a
specific encoding, is completely agnostic about the original encoding of the specific encoding, is completely agnostic about the original encoding of the
document.</p> document.</p>
<p>It should be noted too that the HTML parser embedded in libxml now obbey <p>It should be noted too that the HTML parser embedded in libxml now obey
the same rules too, the following document will be (as of 2.2.2) handled in the same rules too, the following document will be (as of 2.2.2) handled in
an internationalized fashion by libxml too:</p> an internationalized fashion by libxml too:</p>
<pre>&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" <pre>&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
@ -2251,7 +2251,7 @@ rationale for those choices:</p>
cases this may make sense.</li> cases this may make sense.</li>
<li>the second decision was which encoding. From the XML spec only UTF8 and <li>the second decision was which encoding. From the XML spec only UTF8 and
UTF16 really makes sense as being the two only encodings for which there UTF16 really makes sense as being the two only encodings for which there
is amndatory support. UCS-4 (32 bits fixed size encoding) could be is mandatory support. UCS-4 (32 bits fixed size encoding) could be
considered an intelligent choice too since it's a direct Unicode mapping considered an intelligent choice too since it's a direct Unicode mapping
support. I selected UTF-8 on the basis of efficiency and compatibility support. I selected UTF-8 on the basis of efficiency and compatibility
with surrounding software: with surrounding software:
@ -2313,7 +2313,7 @@ err.xml:1: error: Bytes: 0xE8 0x73 0x3E 0x6C
&lt;tr<EFBFBD>s&gt;l<EFBFBD>&lt;/tr<74>s&gt; &lt;tr<EFBFBD>s&gt;l<EFBFBD>&lt;/tr<74>s&gt;
^</pre> ^</pre>
</li> </li>
<li>xmlSwitchEncoding() does an encoding name lookup, canonalize it, and <li>xmlSwitchEncoding() does an encoding name lookup, canonicalize it, and
then search the default registered encoding converters for that encoding. then search the default registered encoding converters for that encoding.
If it's not within the default set and iconv() support has been compiled If it's not within the default set and iconv() support has been compiled
it, it will ask iconv for such an encoder. If this fails then the parser it, it will ask iconv for such an encoder. If this fails then the parser
@ -2323,7 +2323,7 @@ err2.xml:1: error: Unsupported encoding UnsupportedEnc
&lt;?xml version="1.0" encoding="UnsupportedEnc"?&gt; &lt;?xml version="1.0" encoding="UnsupportedEnc"?&gt;
^</pre> ^</pre>
</li> </li>
<li>From that point the encoder process progressingly the input (it is <li>From that point the encoder processes progressingly the input (it is
plugged as a front-end to the I/O module) for that entity. It captures plugged as a front-end to the I/O module) for that entity. It captures
and convert on-the-fly the document to be parsed to UTF-8. The parser and convert on-the-fly the document to be parsed to UTF-8. The parser
itself just does UTF-8 checking of this input and process it itself just does UTF-8 checking of this input and process it
@ -2334,8 +2334,8 @@ err2.xml:1: error: Unsupported encoding UnsupportedEnc
with just an encoding information on the document node.</li> with just an encoding information on the document node.</li>
</ol> </ol>
<p>Ok then what's happen when saving the document (assuming you <p>Ok then what happens when saving the document (assuming you
colllected/built an xmlDoc DOM like structure) ? It depends on the function collected/built an xmlDoc DOM like structure) ? It depends on the function
called, xmlSaveFile() will just try to save in the original encoding, while called, xmlSaveFile() will just try to save in the original encoding, while
xmlSaveFileTo() and xmlSaveFileEnc() can optionally save to a given xmlSaveFileTo() and xmlSaveFileEnc() can optionally save to a given
encoding:</p> encoding:</p>
@ -2346,7 +2346,7 @@ encoding:</p>
<p>otherwise everything is written in the internal form, i.e. UTF-8</p> <p>otherwise everything is written in the internal form, i.e. UTF-8</p>
</li> </li>
<li>so if an encoding was specified, either at the API level or on the <li>so if an encoding was specified, either at the API level or on the
document, libxml will again canonalize the encoding name, lookup for a document, libxml will again canonicalize the encoding name, lookup for a
converter in the registered set or through iconv. If not found the converter in the registered set or through iconv. If not found the
function will return an error code</li> function will return an error code</li>
<li>the converter is placed before the I/O buffer layer, as another kind of <li>the converter is placed before the I/O buffer layer, as another kind of
@ -2354,14 +2354,14 @@ encoding:</p>
that buffer, which will then progressively be converted and pushed onto that buffer, which will then progressively be converted and pushed onto
the I/O layer.</li> the I/O layer.</li>
<li>It is possible that the converter code fails on some input, for example <li>It is possible that the converter code fails on some input, for example
trying to push an UTF-8 encoded chinese character through the UTF-8 to trying to push an UTF-8 encoded Chinese character through the UTF-8 to
ISO-8859-1 converter won't work. Since the encoders are progressive they ISO-8859-1 converter won't work. Since the encoders are progressive they
will just report the error and the number of bytes converted, at that will just report the error and the number of bytes converted, at that
point libxml will decode the offending character, remove it from the point libxml will decode the offending character, remove it from the
buffer and replace it with the associated charRef encoding &amp;#123; and buffer and replace it with the associated charRef encoding &amp;#123; and
resume the convertion. This guarante that any document will be saved resume the conversion. This guarantees that any document will be saved
without losses (except for markup names where this is not legal, this is without losses (except for markup names where this is not legal, this is
a problem in the current version, in pactice avoid using non-ascci a problem in the current version, in practice avoid using non-ascii
characters for tags or attributes names @@). A special "ascii" encoding characters for tags or attributes names @@). A special "ascii" encoding
name is used to save documents to a pure ascii form can be used when name is used to save documents to a pure ascii form can be used when
portability is really crucial</li> portability is really crucial</li>
@ -2397,7 +2397,7 @@ detecting such a tag on input. Except for that the processing is the same
predefined entities like &amp;copy; for the Copyright sign.</li> predefined entities like &amp;copy; for the Copyright sign.</li>
</ol> </ol>
<p>More over when compiled on an Unix platfor with iconv support the full set <p>More over when compiled on an Unix platform with iconv support the full set
of encodings supported by iconv can be instantly be used by libxml. On a of encodings supported by iconv can be instantly be used by libxml. On a
linux machine with glibc-2.1 the list of supported encodings and aliases fill linux machine with glibc-2.1 the list of supported encodings and aliases fill
3 full pages, and include UCS-4, the full set of ISO-Latin encodings, and the 3 full pages, and include UCS-4, the full set of ISO-Latin encodings, and the
@ -2437,7 +2437,7 @@ tried it. The key is to override the default conversion routines (by
registering null encoders/decoders for your charsets), and bypass the UTF-8 registering null encoders/decoders for your charsets), and bypass the UTF-8
checking of the parser by setting the parser context charset checking of the parser by setting the parser context charset
(ctxt-&gt;charset) to something different than XML_CHAR_ENCODING_UTF8, but (ctxt-&gt;charset) to something different than XML_CHAR_ENCODING_UTF8, but
there is no guarantee taht this will work. You may also have some troubles there is no guarantee that this will work. You may also have some troubles
saving back.</p> saving back.</p>
<p>Basically proper I18N support is important, this requires at least <p>Basically proper I18N support is important, this requires at least
@ -2472,7 +2472,7 @@ the interfaces to the libxml I/O system. This consists of 4 main parts:</p>
<li>Input I/O buffers which are a commodity structure used by the parser(s) <li>Input I/O buffers which are a commodity structure used by the parser(s)
input layer to handle fetching the informations to feed the parser. This input layer to handle fetching the informations to feed the parser. This
provides buffering and is also a placeholder where the encoding provides buffering and is also a placeholder where the encoding
convertors to UTF8 are piggy-backed.</li> converters to UTF8 are piggy-backed.</li>
<li>Output I/O buffers are similar to the Input ones and fulfill similar <li>Output I/O buffers are similar to the Input ones and fulfill similar
task but when generating a serialization from a tree.</li> task but when generating a serialization from a tree.</li>
<li>A mechanism to register sets of I/O callbacks and associate them with <li>A mechanism to register sets of I/O callbacks and associate them with
@ -2499,7 +2499,7 @@ example in the HTML parser is the following:</p>
buffer, providing buffering and efficient use of the conversion buffer, providing buffering and efficient use of the conversion
routines</li> routines</li>
<li>once the parser has finished, the close() function of the handler is <li>once the parser has finished, the close() function of the handler is
called once and the Input buffer and associed resources are called once and the Input buffer and associated resources are
deallocated.</li> deallocated.</li>
</ol> </ol>
@ -2513,7 +2513,7 @@ default libxml I/O routines.</p>
href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a> </code>which is a href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a> </code>which is a
resizable memory buffer. The buffer allocation strategy can be selected to be resizable memory buffer. The buffer allocation strategy can be selected to be
either best-fit or use an exponential doubling one (CPU vs. memory use either best-fit or use an exponential doubling one (CPU vs. memory use
tradeoff). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and trade-off). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and
<code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a <code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a
system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number
of functions allows to manipulate buffers with names starting with the of functions allows to manipulate buffers with names starting with the
@ -2583,7 +2583,7 @@ and this was a problem. The <a
href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a
new output handler with the closing call deactivated:</p> new output handler with the closing call deactivated:</p>
<ol> <ol>
<li>First define a new I/O ouput allocator where the output don't close the <li>First define a new I/O output allocator where the output don't close the
file: file:
<pre>xmlOutputBufferPtr <pre>xmlOutputBufferPtr
xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) { xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) {
@ -2983,7 +2983,7 @@ support.</p>
<p>The XML Catalog specification is relatively recent so there isn't much <p>The XML Catalog specification is relatively recent so there isn't much
literature to point at:</p> literature to point at:</p>
<ul> <ul>
<li>You can find an good rant from Norm Walsh about <a <li>You can find a good rant from Norm Walsh about <a
href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the href="http://www.arbortext.com/Think_Tank/XML_Resources/Issue_Three/issue_three.html">the
need for catalogs</a>, it provides a lot of context informations even if need for catalogs</a>, it provides a lot of context informations even if
I don't agree with everything presented. Norm also wrote a more recent I don't agree with everything presented. Norm also wrote a more recent
@ -3007,7 +3007,7 @@ literature to point at:</p>
~/xmlcatalog and ~/dbkxmlcatalog and doing: ~/xmlcatalog and ~/dbkxmlcatalog and doing:
<p><code>export XMLCATALOG=$HOME/xmlcatalog</code></p> <p><code>export XMLCATALOG=$HOME/xmlcatalog</code></p>
<p>should allow to process DocBook documentations without requiring <p>should allow to process DocBook documentations without requiring
network accesses for the DTd or stylesheets</p> network accesses for the DTD or stylesheets</p>
</li> </li>
<li>I have uploaded <a href="ftp://xmlsoft.org/test/dbk412catalog.tar.gz">a <li>I have uploaded <a href="ftp://xmlsoft.org/test/dbk412catalog.tar.gz">a
small tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems small tarball</a> containing XML Catalogs for DocBook 4.1.2 which seems
@ -3257,7 +3257,7 @@ beginning). Example:</p>
<p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing <p>Line 3 declares the xml entity. Line 6 uses the xml entity, by prefixing
its name with '&amp;' and following it by ';' without any spaces added. There its name with '&amp;' and following it by ';' without any spaces added. There
are 5 predefined entities in libxml allowing you to escape charaters with are 5 predefined entities in libxml allowing you to escape characters with
predefined meaning in some parts of the xml document content: predefined meaning in some parts of the xml document content:
<strong>&amp;lt;</strong> for the character '&lt;', <strong>&amp;gt;</strong> <strong>&amp;lt;</strong> for the character '&lt;', <strong>&amp;gt;</strong>
for the character '&gt;', <strong>&amp;apos;</strong> for the character ''', for the character '&gt;', <strong>&amp;apos;</strong> for the character ''',
@ -3270,7 +3270,7 @@ your application. Or you may prefer to keep entity references as such in the
content to be able to save the document back without losing this usually content to be able to save the document back without losing this usually
precious information (if the user went through the pain of explicitly precious information (if the user went through the pain of explicitly
defining entities, he may have a a rather negative attitude if you blindly defining entities, he may have a a rather negative attitude if you blindly
susbtitute them as saving time). The <a substitute them as saving time). The <a
href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a> href="html/libxml-parser.html#XMLSUBSTITUTEENTITIESDEFAULT">xmlSubstituteEntitiesDefault()</a>
function allows you to check and change the behaviour, which is to not function allows you to check and change the behaviour, which is to not
substitute entities by default.</p> substitute entities by default.</p>
@ -3310,7 +3310,7 @@ finding them in the input).</p>
<p><span style="background-color: #FF0000">WARNING</span>: handling entities <p><span style="background-color: #FF0000">WARNING</span>: handling entities
on top of the libxml SAX interface is difficult!!! If you plan to use on top of the libxml SAX interface is difficult!!! If you plan to use
non-predefined entities in your documents, then the learning cuvre to handle non-predefined entities in your documents, then the learning curve to handle
then using the SAX API may be long. If you plan to use complex documents, I then using the SAX API may be long. If you plan to use complex documents, I
strongly suggest you consider using the DOM interface instead and let libxml strongly suggest you consider using the DOM interface instead and let libxml
deal with the complexity rather than trying to do it yourself.</p> deal with the complexity rather than trying to do it yourself.</p>
@ -3319,7 +3319,7 @@ deal with the complexity rather than trying to do it yourself.</p>
<p>The libxml library implements <a <p>The libxml library implements <a
href="http://www.w3.org/TR/REC-xml-names/">XML namespaces</a> support by href="http://www.w3.org/TR/REC-xml-names/">XML namespaces</a> support by
recognizing namespace contructs in the input, and does namespace lookup recognizing namespace constructs in the input, and does namespace lookup
automatically when building the DOM tree. A namespace declaration is automatically when building the DOM tree. A namespace declaration is
associated with an in-memory structure and all elements or attributes within associated with an in-memory structure and all elements or attributes within
that namespace point to it. Hence testing the namespace is a simple and fast that namespace point to it. Hence testing the namespace is a simple and fast
@ -3338,7 +3338,7 @@ value in the long-term. Example:</p>
<p>The namespace value has to be an absolute URL, but the URL doesn't have to <p>The namespace value has to be an absolute URL, but the URL doesn't have to
point to any existing resource on the Web. It will bind all the element and point to any existing resource on the Web. It will bind all the element and
atributes with that URL. I suggest to use an URL within a domain you control, attributes with that URL. I suggest to use an URL within a domain you control,
and that the URL should contain some kind of version information if possible. and that the URL should contain some kind of version information if possible.
For example, <code>"http://www.gnome.org/gnumeric/1.0/"</code> is a good For example, <code>"http://www.gnome.org/gnumeric/1.0/"</code> is a good
namespace scheme.</p> namespace scheme.</p>
@ -3402,14 +3402,14 @@ mail</a>:</p>
select the right parameters libxml2</li> select the right parameters libxml2</li>
<li>Node <strong>childs</strong> field has been renamed <li>Node <strong>childs</strong> field has been renamed
<strong>children</strong> so s/childs/children/g should be applied <strong>children</strong> so s/childs/children/g should be applied
(probablility of having "childs" anywere else is close to 0+</li> (probability of having "childs" anywhere else is close to 0+</li>
<li>The document don't have anymore a <strong>root</strong> element it has <li>The document don't have anymore a <strong>root</strong> element it has
been replaced by <strong>children</strong> and usually you will get a been replaced by <strong>children</strong> and usually you will get a
list of element here. For example a Dtd element for the internal subset list of element here. For example a Dtd element for the internal subset
and it's declaration may be found in that list, as well as processing and it's declaration may be found in that list, as well as processing
instructions or comments found before or after the document root element. instructions or comments found before or after the document root element.
Use <strong>xmlDocGetRootElement(doc)</strong> to get the root element of Use <strong>xmlDocGetRootElement(doc)</strong> to get the root element of
a document. Alternatively if you are sure to not reference Dtds nor have a document. Alternatively if you are sure to not reference DTDs nor have
PIs or comments before or after the root element PIs or comments before or after the root element
s/-&gt;root/-&gt;children/g will probably do it.</li> s/-&gt;root/-&gt;children/g will probably do it.</li>
<li>The white space issue, this one is more complex, unless special case of <li>The white space issue, this one is more complex, unless special case of
@ -3423,9 +3423,9 @@ mail</a>:</p>
relying on a special (and possibly broken) set of heuristics of relying on a special (and possibly broken) set of heuristics of
libxml to detect ignorable blanks. Don't complain if it breaks or libxml to detect ignorable blanks. Don't complain if it breaks or
make your application not 100% clean w.r.t. to it's input.</li> make your application not 100% clean w.r.t. to it's input.</li>
<li>the Right Way: change you code to accept possibly unsignificant <li>the Right Way: change you code to accept possibly insignificant
blanks characters, or have your tree populated with weird blank text blanks characters, or have your tree populated with weird blank text
nodes. You can spot them using the comodity function nodes. You can spot them using the commodity function
<strong>xmlIsBlankNode(node)</strong> returning 1 for such blank <strong>xmlIsBlankNode(node)</strong> returning 1 for such blank
nodes.</li> nodes.</li>
</ol> </ol>
@ -3441,14 +3441,14 @@ mail</a>:</p>
<p>output to generate you compile commands this will probably work out of <p>output to generate you compile commands this will probably work out of
the box</p> the box</p>
</li> </li>
<li>xmlDetectCharEncoding takes an extra argument indicating the lenght in <li>xmlDetectCharEncoding takes an extra argument indicating the length in
byte of the head of the document available for character detection.</li> byte of the head of the document available for character detection.</li>
</ol> </ol>
<h3>Ensuring both libxml-1.x and libxml-2.x compatibility</h3> <h3>Ensuring both libxml-1.x and libxml-2.x compatibility</h3>
<p>Two new version of libxml (1.8.11) and libxml2 (2.3.4) have been released <p>Two new version of libxml (1.8.11) and libxml2 (2.3.4) have been released
to allow smoth upgrade of existing libxml v1code while retaining to allow smooth upgrade of existing libxml v1code while retaining
compatibility. They offers the following:</p> compatibility. They offers the following:</p>
<ol> <ol>
<li>similar include naming, one should use <li>similar include naming, one should use
@ -3464,15 +3464,15 @@ compatibility. They offers the following:</p>
following:</p> following:</p>
<ol> <ol>
<li>install the libxml-1.8.8 (and libxml-devel-1.8.8) packages</li> <li>install the libxml-1.8.8 (and libxml-devel-1.8.8) packages</li>
<li>find all occurences where the xmlDoc <strong>root</strong> field is <li>find all occurrences where the xmlDoc <strong>root</strong> field is
used and change it to <strong>xmlRootNode</strong></li> used and change it to <strong>xmlRootNode</strong></li>
<li>similary find all occurences where the xmlNode <strong>childs</strong> <li>similarly find all occurrences where the xmlNode <strong>childs</strong>
field is used and change it to <strong>xmlChildrenNode</strong></li> field is used and change it to <strong>xmlChildrenNode</strong></li>
<li>add a <strong>LIBXML_TEST_VERSION</strong> macro somewhere in your <li>add a <strong>LIBXML_TEST_VERSION</strong> macro somewhere in your
<strong>main()</strong> or in the library init entry point</li> <strong>main()</strong> or in the library init entry point</li>
<li>Recompile, check compatibility, it should still work</li> <li>Recompile, check compatibility, it should still work</li>
<li>Change your configure script to look first for xml2-config and fall back <li>Change your configure script to look first for xml2-config and fall back
using xml-config . Use the --cflags and --libs ouptut of the command as using xml-config . Use the --cflags and --libs output of the command as
the Include and Linking parameters needed to use libxml.</li> the Include and Linking parameters needed to use libxml.</li>
<li>install libxml2-2.3.x and libxml2-devel-2.3.x (libxml-1.8.y and <li>install libxml2-2.3.x and libxml2-devel-2.3.x (libxml-1.8.y and
libxml-devel-1.8.y can be kept simultaneously)</li> libxml-devel-1.8.y can be kept simultaneously)</li>
@ -3495,7 +3495,7 @@ not upgrade, it may cost a lot on the long term ...</p>
<h2><a name="Thread">Thread safety</a></h2> <h2><a name="Thread">Thread safety</a></h2>
<p>Starting with 2.4.7, libxml makes provisions to ensure that concurent <p>Starting with 2.4.7, libxml makes provisions to ensure that concurrent
threads can safely work in parallel parsing different documents. There is threads can safely work in parallel parsing different documents. There is
however a couple of things to do to ensure it:</p> however a couple of things to do to ensure it:</p>
<ul> <ul>
@ -3602,7 +3602,7 @@ base</a>:</p>
&lt;/gjob:Helping&gt;</pre> &lt;/gjob:Helping&gt;</pre>
<p>While loading the XML file into an internal DOM tree is a matter of <p>While loading the XML file into an internal DOM tree is a matter of
calling only a couple of functions, browsing the tree to gather the ata and calling only a couple of functions, browsing the tree to gather the data and
generate the internal structures is harder, and more error prone.</p> generate the internal structures is harder, and more error prone.</p>
<p>The suggested principle is to be tolerant with respect to the input <p>The suggested principle is to be tolerant with respect to the input
@ -3656,8 +3656,8 @@ DEBUG("parsePerson\n");
<p>Here are a couple of things to notice:</p> <p>Here are a couple of things to notice:</p>
<ul> <ul>
<li>Usually a recursive parsing style is the more convenient one: XML data <li>Usually a recursive parsing style is the more convenient one: XML data
is by nature subject to repetitive constructs and usually exibits highly is by nature subject to repetitive constructs and usually exhibits highly
stuctured patterns.</li> structured patterns.</li>
<li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>, <li>The two arguments of type <em>xmlDocPtr</em> and <em>xmlNsPtr</em>,
i.e. the pointer to the global XML document and the namespace reserved to i.e. the pointer to the global XML document and the namespace reserved to
the application. Document wide information are needed for example to the application. Document wide information are needed for example to
@ -3725,7 +3725,7 @@ DEBUG("parseJob\n");
}</pre> }</pre>
<p>Once you are used to it, writing this kind of code is quite simple, but <p>Once you are used to it, writing this kind of code is quite simple, but
boring. Ultimately, it could be possble to write stubbers taking either C boring. Ultimately, it could be possible to write stubbers taking either C
data structure definitions, a set of XML examples or an XML DTD and produce data structure definitions, a set of XML examples or an XML DTD and produce
the code needed to import and export the content between C data and XML the code needed to import and export the content between C data and XML
storage. This is left as an exercise to the reader :-)</p> storage. This is left as an exercise to the reader :-)</p>
@ -3748,8 +3748,8 @@ Gnome CVS base under gnome-xml/example</p>
<a href="http://garypennington.net/libxml2/">Solaris binaries</a></li> <a href="http://garypennington.net/libxml2/">Solaris binaries</a></li>
<li><a <li><a
href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt href="http://mail.gnome.org/archives/xml/2001-March/msg00014.html">Matt
Sergeant</a> developped <a Sergeant</a> developed <a
href="http://axkit.org/download/">XML::LibXSLT</a>, a perl wrapper for href="http://axkit.org/download/">XML::LibXSLT</a>, a Perl wrapper for
libxml2/libxslt as part of the <a href="http://axkit.com/">AxKit XML libxml2/libxslt as part of the <a href="http://axkit.com/">AxKit XML
application server</a></li> application server</a></li>
<li><a href="mailto:fnatter@gmx.net">Felix Natter</a> and <a <li><a href="mailto:fnatter@gmx.net">Felix Natter</a> and <a

View File

@ -104,8 +104,8 @@ A:link, A:visited, A:active { text-decoration: underline }
<h3><a name="General5">General overview</a></h3> <h3><a name="General5">General overview</a></h3>
<p>Well what is validation and what is a DTD ?</p> <p>Well what is validation and what is a DTD ?</p>
<p>DTD is the acronym for Document Type Definition. This is a description of <p>DTD is the acronym for Document Type Definition. This is a description of
the content for a familly of XML files. This is part of the XML 1.0 the content for a family of XML files. This is part of the XML 1.0
specification, and alows to describe and check that a given document instance specification, and allows to describe and check that a given document instance
conforms to a set of rules detailing its structure and content.</p> conforms to a set of rules detailing its structure and content.</p>
<p>Validation is the process of checking a document against a DTD (more <p>Validation is the process of checking a document against a DTD (more
generally against a set of construction rules).</p> generally against a set of construction rules).</p>
@ -130,10 +130,10 @@ ancient...</p>
<h3><a name="Simple1">Simple rules</a></h3> <h3><a name="Simple1">Simple rules</a></h3>
<p>Writing DTD can be done in multiple ways, the rules to build them if you <p>Writing DTD can be done in multiple ways, the rules to build them if you
need something fixed or something which can evolve over time can be radically need something fixed or something which can evolve over time can be radically
different. Really complex DTD like Docbook ones are flexible but quite harder different. Really complex DTD like DocBook ones are flexible but quite harder
to design. I will just focuse on DTDs for a formats with a fixed simple to design. I will just focus on DTDs for a formats with a fixed simple
structure. It is just a set of basic rules, and definitely not exhaustive nor structure. It is just a set of basic rules, and definitely not exhaustive nor
useable for complex DTD design.</p> usable for complex DTD design.</p>
<h4> <h4>
<a name="reference1">How to reference a DTD from a document</a>:</h4> <a name="reference1">How to reference a DTD from a document</a>:</h4>
<p>Assuming the top element of the document is <code>spec</code> and the dtd <p>Assuming the top element of the document is <code>spec</code> and the dtd
@ -146,10 +146,10 @@ is placed in the file <code>mydtd</code> in the subdirectory
full URL string indicating the location of your DTD on the Web, this is a full URL string indicating the location of your DTD on the Web, this is a
really good thing to do if you want others to validate your document</li> really good thing to do if you want others to validate your document</li>
<li>it is also possible to associate a <code>PUBLIC</code> identifier (a <li>it is also possible to associate a <code>PUBLIC</code> identifier (a
magic string) so that the DTd is looked up in catalogs on the client side magic string) so that the DTD is looked up in catalogs on the client side
without having to locate it on the web</li> without having to locate it on the web</li>
<li>a dtd contains a set of elements and attributes declarations, but they <li>a dtd contains a set of elements and attributes declarations, but they
don't define what the root of the document should be. This is explicitely don't define what the root of the document should be. This is explicitly
told to the parser/validator as the first element of the told to the parser/validator as the first element of the
<code>DOCTYPE</code> declaration.</li> <code>DOCTYPE</code> declaration.</li>
</ul> </ul>
@ -158,9 +158,9 @@ is placed in the file <code>mydtd</code> in the subdirectory
<p>The following declares an element <code>spec</code>:</p> <p>The following declares an element <code>spec</code>:</p>
<p><code>&lt;!ELEMENT spec (front, body, back?)&gt;</code></p> <p><code>&lt;!ELEMENT spec (front, body, back?)&gt;</code></p>
<p>it also expresses that the spec element contains one <code>front</code>, <p>it also expresses that the spec element contains one <code>front</code>,
one <code>body</code> and one optionnal <code>back</code> children elements one <code>body</code> and one optional <code>back</code> children elements
in this order. The declaration of one element of the structure and its in this order. The declaration of one element of the structure and its
content are done in a single declaration. Similary the following declares content are done in a single declaration. Similarly the following declares
<code>div1</code> elements:</p> <code>div1</code> elements:</p>
<p><code>&lt;!ELEMENT div1 (head, (p | list | note)*, div2?)&gt;</code></p> <p><code>&lt;!ELEMENT div1 (head, (p | list | note)*, div2?)&gt;</code></p>
<p>means div1 contains one <code>head</code> then a series of optional <p>means div1 contains one <code>head</code> then a series of optional
@ -181,14 +181,14 @@ order.</p>
<p>again the attributes declaration includes their content definition:</p> <p>again the attributes declaration includes their content definition:</p>
<p><code>&lt;!ATTLIST termdef name CDATA #IMPLIED&gt;</code></p> <p><code>&lt;!ATTLIST termdef name CDATA #IMPLIED&gt;</code></p>
<p>means that the element <code>termdef</code> can have a <code>name</code> <p>means that the element <code>termdef</code> can have a <code>name</code>
attribute containing text (<code>CDATA</code>) and which is optionnal attribute containing text (<code>CDATA</code>) and which is optional
(<code>#IMPLIED</code>). The attribute value can also be defined within a (<code>#IMPLIED</code>). The attribute value can also be defined within a
set:</p> set:</p>
<p><code>&lt;!ATTLIST list type (bullets|ordered|glossary) <p><code>&lt;!ATTLIST list type (bullets|ordered|glossary)
&quot;ordered&quot;&gt;</code></p> &quot;ordered&quot;&gt;</code></p>
<p>means <code>list</code> element have a <code>type</code> attribute with 3 <p>means <code>list</code> element have a <code>type</code> attribute with 3
allowed values &quot;bullets&quot;, &quot;ordered&quot; or &quot;glossary&quot; and which default to allowed values &quot;bullets&quot;, &quot;ordered&quot; or &quot;glossary&quot; and which default to
&quot;ordered&quot; if the attribute is not explicitely specified.</p> &quot;ordered&quot; if the attribute is not explicitly specified.</p>
<p>The content type of an attribute can be text (<code>CDATA</code>), <p>The content type of an attribute can be text (<code>CDATA</code>),
anchor/reference/references anchor/reference/references
(<code>ID</code>/<code>IDREF</code>/<code>IDREFS</code>), entity(ies) (<code>ID</code>/<code>IDREF</code>/<code>IDREFS</code>), entity(ies)
@ -219,7 +219,7 @@ contains some complex DTD examples. The <code>test/valid/dia.xml</code>
example shows an XML file where the simple DTD is directly included within example shows an XML file where the simple DTD is directly included within
the document.</p> the document.</p>
<h3><a name="validate1">How to validate</a></h3> <h3><a name="validate1">How to validate</a></h3>
<p>The simplest is to use the xmllint program comming with libxml. The <p>The simplest is to use the xmllint program coming with libxml. The
<code>--valid</code> option turn on validation of the files given as input, <code>--valid</code> option turn on validation of the files given as input,
for example the following validates a copy of the first revision of the XML for example the following validates a copy of the first revision of the XML
1.0 specification:</p> 1.0 specification:</p>

View File

@ -109,7 +109,7 @@ the interfaces to the libxml I/O system. This consists of 4 main parts:</p>
<li>Input I/O buffers which are a commodity structure used by the parser(s) <li>Input I/O buffers which are a commodity structure used by the parser(s)
input layer to handle fetching the informations to feed the parser. This input layer to handle fetching the informations to feed the parser. This
provides buffering and is also a placeholder where the encoding provides buffering and is also a placeholder where the encoding
convertors to UTF8 are piggy-backed.</li> converters to UTF8 are piggy-backed.</li>
<li>Output I/O buffers are similar to the Input ones and fulfill similar <li>Output I/O buffers are similar to the Input ones and fulfill similar
task but when generating a serialization from a tree.</li> task but when generating a serialization from a tree.</li>
<li>A mechanism to register sets of I/O callbacks and associate them with <li>A mechanism to register sets of I/O callbacks and associate them with
@ -135,7 +135,7 @@ example in the HTML parser is the following:</p>
buffer, providing buffering and efficient use of the conversion buffer, providing buffering and efficient use of the conversion
routines</li> routines</li>
<li>once the parser has finished, the close() function of the handler is <li>once the parser has finished, the close() function of the handler is
called once and the Input buffer and associed resources are called once and the Input buffer and associated resources are
deallocated.</li> deallocated.</li>
</ol> </ol>
<p>The user defined callbacks are checked first to allow overriding of the <p>The user defined callbacks are checked first to allow overriding of the
@ -145,7 +145,7 @@ default libxml I/O routines.</p>
<code>xmlBuffer</code> type define in <code><a href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a></code>which is a <code>xmlBuffer</code> type define in <code><a href="http://xmlsoft.org/html/libxml-tree.html">tree.h</a></code>which is a
resizable memory buffer. The buffer allocation strategy can be selected to be resizable memory buffer. The buffer allocation strategy can be selected to be
either best-fit or use an exponential doubling one (CPU vs. memory use either best-fit or use an exponential doubling one (CPU vs. memory use
tradeoff). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and trade-off). The values are <code>XML_BUFFER_ALLOC_EXACT</code> and
<code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a <code>XML_BUFFER_ALLOC_DOUBLEIT</code>, and can be set individually or on a
system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number system wide basis using <code>xmlBufferSetAllocationScheme()</code>. A number
of functions allows to manipulate buffers with names starting with the of functions allows to manipulate buffers with names starting with the
@ -205,7 +205,7 @@ real use case</a>, xmlDocDump() closes the FILE * passed by the application
and this was a problem. The <a href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a and this was a problem. The <a href="http://xmlsoft.org/messages/0711.html">solution</a> was to redefine a
new output handler with the closing call deactivated:</p> new output handler with the closing call deactivated:</p>
<ol> <ol>
<li>First define a new I/O ouput allocator where the output don't close the <li>First define a new I/O output allocator where the output don't close the
file: file:
<pre>xmlOutputBufferPtr <pre>xmlOutputBufferPtr
xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) { xmlOutputBufferCreateOwn(FILE *file, xmlCharEncodingHandlerPtr encoder) {

View File

@ -121,7 +121,7 @@ any other libxml routines (unless you are sure your allocations routines are
compatibles).</p> compatibles).</p>
<h3><a name="cleanup">Cleaning up after parsing</a></h3> <h3><a name="cleanup">Cleaning up after parsing</a></h3>
<p>Libxml is not stateless, there is a few set of memory structures needing <p>Libxml is not stateless, there is a few set of memory structures needing
allocation before the parser is fully functionnal (some encoding structures allocation before the parser is fully functional (some encoding structures
for example). This also mean that once parsing is finished there is a tiny for example). This also mean that once parsing is finished there is a tiny
amount of memory (a few hundred bytes) which can be recollected if you don't amount of memory (a few hundred bytes) which can be recollected if you don't
reuse the parser immediately:</p> reuse the parser immediately:</p>
@ -156,7 +156,7 @@ or call a specific routine when a given block number is allocated:</p>
()</a> dumps all the informations about the allocated memory block lefts ()</a> dumps all the informations about the allocated memory block lefts
in the <code>.memdump</code> file</li> in the <code>.memdump</code> file</li>
</ul> </ul>
<p>When developping libxml memory debug is enabled, the tests programs call <p>When developing libxml memory debug is enabled, the tests programs call
xmlMemoryDump () and the &quot;make test&quot; regression tests will check for any xmlMemoryDump () and the &quot;make test&quot; regression tests will check for any
memory leak during the full regression test sequence, this helps a lot memory leak during the full regression test sequence, this helps a lot
ensuring that libxml does not leak memory and bullet proof memory ensuring that libxml does not leak memory and bullet proof memory
@ -165,11 +165,11 @@ resulting in major portability problems!).</p>
<p>If the .memdump reports a leak, it displays the allocation function and <p>If the .memdump reports a leak, it displays the allocation function and
also tries to give some informations about the content and structure of the also tries to give some informations about the content and structure of the
allocated blocks left. This is sufficient in most cases to find the culprit, allocated blocks left. This is sufficient in most cases to find the culprit,
but not always. Assuming the allocation problem is reproductible, it is but not always. Assuming the allocation problem is reproducible, it is
possible to find more easilly:</p> possible to find more easily:</p>
<ol> <ol>
<li>write down the block number xxxx not allocated</li> <li>write down the block number xxxx not allocated</li>
<li>export the environement variable XML_MEM_BREAKPOINT=xxxx , the easiest <li>export the environment variable XML_MEM_BREAKPOINT=xxxx , the easiest
when using GDB is to simply give the command when using GDB is to simply give the command
<p><code>set environment XML_MEM_BREAKPOINT xxxx</code></p> <p><code>set environment XML_MEM_BREAKPOINT xxxx</code></p>
<p>before running the program.</p> <p>before running the program.</p>
@ -191,15 +191,15 @@ spot memory usage errors in a very precise way.</p>
<p>How much libxml memory require ? It's hard to tell in average it depends <p>How much libxml memory require ? It's hard to tell in average it depends
of a number of things:</p> of a number of things:</p>
<ul> <ul>
<li>the parser itself should work in a fixed amout of memory, except for <li>the parser itself should work in a fixed amount of memory, except for
information maintained about the stacks of names and entities locations. information maintained about the stacks of names and entities locations.
The I/O and encoding handlers will probably account for a few KBytes. The I/O and encoding handlers will probably account for a few KBytes.
This is true for both the XML and HTML parser (though the HTML parser This is true for both the XML and HTML parser (though the HTML parser
need more state).</li> need more state).</li>
<li>If you are generating the DOM tree then memory requirements will grow <li>If you are generating the DOM tree then memory requirements will grow
nearly lineary with the size of the data. In general for a balanced nearly linear with the size of the data. In general for a balanced
textual document the internal memory requirement is about 4 times the textual document the internal memory requirement is about 4 times the
size of the UTF8 serialization of this document (exmple the XML-1.0 size of the UTF8 serialization of this document (example the XML-1.0
recommendation is a bit more of 150KBytes and takes 650KBytes of main recommendation is a bit more of 150KBytes and takes 650KBytes of main
memory when parsed). Validation will add a amount of memory required for memory when parsed). Validation will add a amount of memory required for
maintaining the external Dtd state which should be linear with the maintaining the external Dtd state which should be linear with the