Put documentation on XML data type and functions in better positions. Add

some index terms.
2025-07-31 22:04:40 +03:00 · 2007-04-02 15:27:02 +00:00
parent b7d3a84539
commit 626b4416b9
3 changed files with 807 additions and 797 deletions
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.118 2007/03/26 01:41:57 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.119 2007/04/02 15:27:02 petere Exp $ -->
 <chapter Id="runtime-config">
  <title>Server Configuration</title>
@ -3591,7 +3591,7 @@ SELECT * FROM parent WHERE key = 2400;
       <primary><varname>SET XML OPTION</></primary>
      </indexterm>
      <indexterm>
-       <primary><varname>XML option</></primary>
+       <primary>XML option</primary>
      </indexterm>
      <listitem>
       <para>
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.192 2007/04/02 03:49:36 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.193 2007/04/02 15:27:02 petere Exp $ -->
 <chapter id="datatype">
  <title id="datatype-title">Data Types</title>
@ -3190,6 +3190,144 @@ SELECT * FROM test;
  </sect1>
  <sect1 id="datatype-xml">
   <title><acronym>XML</> Type</title>
   <indexterm zone="datatype-xml">
    <primary>XML</primary>
   </indexterm>
   <para>
    The data type <type>xml</type> can be used to store XML data.  Its
    advantage over storing XML data in a <type>text</type> field is that it
    checks the input values for well-formedness, and there are support
    functions to perform type-safe operations on it; see <xref
    linkend="functions-xml">.
   </para>
   <para>
    In particular, the <type>xml</type> type can store well-formed
    <quote>documents</quote>, as defined by the XML standard, as well
    as <quote>content</quote> fragments, which are defined by the
    production <literal>XMLDecl? content</literal> in the XML
    standard.  Roughly, this means that content fragments can have
    more than one top-level element or character node.  The expression
    <literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
    can be used to evaluate whether a particular <type>xml</type>
    value is a full document or only a content fragment.
   </para>
   <para>
    To produce a value of type <type>xml</type> from character data,
    use the function
    <function>xmlparse</function>:<indexterm><primary>xmlparse</primary></indexterm>
 <synopsis>
 XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>)
 </synopsis>
    Examples:
 <programlisting><![CDATA[
 XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter><book>')
 XMLPARSE (CONTENT 'abc<foo>bar</bar><bar>foo</foo>')
 ]]></programlisting>
    While this is the only way to convert character strings into XML
    values according to the SQL standard, the PostgreSQL-specific
    syntaxes:
 <programlisting><![CDATA[
 xml '<foo>bar</foo>'
 '<foo>bar</foo>'::xml
 ]]></programlisting>
    can also be used.
   </para>
   <para>
    The <type>xml</type> type does not validate its input values
    against a possibly included document type declaration
    (DTD).<indexterm><primary>DTD</primary></indexterm>
   </para>
   <para>
    The inverse operation, producing character string type values from
    <type>xml</type>, uses the function
    <function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
 <synopsis>
 XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
 </synopsis>
    <replaceable>type</replaceable> can be one of
    <type>character</type>, <type>character varying</type>, or
    <type>text</type> (or an alias name for those).  Again, according
    to the SQL standard, this is the only way to convert between type
    <type>xml</type> and character types, but PostgreSQL also allows
    you to simply cast the value.
   </para>
   <para>
    When character string values are cast to or from type
    <type>xml</type> without going through <type>XMLPARSE</type> or
    <type>XMLSERIALIZE</type>, respectively, the choice of
    <literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is
    determined by the <quote>XML option</quote>
    <indexterm><primary>XML option</primary></indexterm>
    session configuration parameter, which can be set using the
    standard command
 <synopsis>
 SET XML OPTION { DOCUMENT | CONTENT };
 </synopsis>
    or the more PostgreSQL-like syntax
 <synopsis>
 SET xmloption TO { DOCUMENT | CONTENT };
 </synopsis>
    The default is <literal>CONTENT</literal>, so all forms of XML
    data are allowed.
   </para>
   <para>
    Care must be taken when dealing with multiple character encodings
    on the client, server, and in the XML data passed through them.
    When using the text mode to pass queries to the server and query
    results to the client (which is the normal mode), PostgreSQL
    converts all character data passed between the client and the
    server and vice versa to the character encoding of the respective
    end; see <xref linkend="multibyte">.  This includes string
    representations of XML values, such as in the above examples.
    This would ordinarily mean that encoding declarations contained in
    XML data might become invalid as the character data is converted
    to other encodings while travelling between client and server,
    while the embedded encoding declaration is not changed.  To cope
    with this behavior, an encoding declaration contained in a
    character string presented for input to the <type>xml</type> type
    is <emphasis>ignored</emphasis>, and the content is always assumed
    to be in the current server encoding.  Consequently, for correct
    processing, such character strings of XML data must be sent off
    from the client in the current client encoding.  It is the
    responsibility of the client to either convert the document to the
    current client encoding before sending it off to the server or to
    adjust the client encoding appropriately.  On output, values of
    type <type>xml</type> will not have an encoding declaration, and
    clients must assume that the data is in the current client
    encoding.
   </para>
   <para>
    When using the binary mode to pass query parameters to the server
    and query results back the the client, no character set conversion
    is performed, so the situation is different.  In this case, an
    encoding declaration in the XML data will be observed, and if it
    is absent, the data will be assumed to be in UTF-8 (as required by
    the XML standard; note that PostgreSQL does not support UTF-16 at
    all).  On output, data will have an encoding declaration
    specifying the client encoding, unless the client encoding is
    UTF-8, in which case it will be omitted.
   </para>
   <para>
    Needless to say, processing XML data with PostgreSQL will be less
    error-prone and more efficient if data encoding, client encoding,
    and server encoding are the same.  Since XML data is internally
    processed in UTF-8, computations will be most efficient if the
    server encoding is also UTF-8.
   </para>
  </sect1>
  &array;
  &rowtypes;
@ -3579,138 +3717,4 @@ SELECT * FROM pg_attribute
  </sect1>
  <sect1 id="datatype-xml">
   <title><acronym>XML</> Type</title>
   <indexterm zone="datatype-xml">
    <primary>XML</primary>
   </indexterm>
   <para>
    The data type <type>xml</type> can be used to store XML data.  Its
    advantage over storing XML data in a <type>text</type> field is that it
    checks the input values for well-formedness, and there are support
    functions to perform type-safe operations on it; see <xref
    linkend="functions-xml">.
   </para>
   <para>
    In particular, the <type>xml</type> type can store well-formed
    <quote>documents</quote>, as defined by the XML standard, as well
    as <quote>content</quote> fragments, which are defined by the
    production <literal>XMLDecl? content</literal> in the XML
    standard.  Roughly, this means that content fragments can have
    more than one top-level element or character node.  The expression
    <literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
    can be used to evaluate whether a particular <type>xml</type>
    value is a full document or only a content fragment.
   </para>
   <para>
    To produce a value of type <type>xml</type> from character data,
    use the function <function>xmlparse</function>:
 <synopsis>
 XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>)
 </synopsis>
    Examples:
 <programlisting><![CDATA[
 XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter><book>')
 XMLPARSE (CONTENT 'abc<foo>bar</bar><bar>foo</foo>')
 ]]></programlisting>
    While this is the only way to convert character strings into XML
    values according to the SQL standard, the PostgreSQL-specific
    syntaxes:
 <programlisting><![CDATA[
 xml '<foo>bar</foo>'
 '<foo>bar</foo>'::xml
 ]]></programlisting>
    can also be used.
   </para>
   <para>
    The <type>xml</type> type does not validate its input values
    against a possibly included document type declaration (DTD).
   </para>
   <para>
    The inverse operation, producing character string type values from
    <type>xml</type>, uses the function
    <function>xmlserialize</function>:
 <synopsis>
 XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
 </synopsis>
    <replaceable>type</replaceable> can be one of
    <type>character</type>, <type>character varying</type>, or
    <type>text</type> (or an alias name for those).  Again, according
    to the SQL standard, this is the only way to convert between type
    <type>xml</type> and character types, but PostgreSQL also allows
    you to simply cast the value.
   </para>
   <para>
    When character string values are cast to or from type
    <type>xml</type> without going through <type>XMLPARSE</type> or
    <type>XMLSERIALIZE</type>, respectively, the choice of
    <literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is
    determined by the <quote>XML option</quote> session configuration
    parameter, which can be set using the standard command
 <synopsis>
 SET XML OPTION { DOCUMENT | CONTENT };
 </synopsis>
    or the more PostgreSQL-like syntax
 <synopsis>
 SET xmloption TO { DOCUMENT | CONTENT };
 </synopsis>
    The default is <literal>CONTENT</literal>, so all forms of XML
    data are allowed.
   </para>
   <para>
    Care must be taken when dealing with multiple character encodings
    on the client, server, and in the XML data passed through them.
    When using the text mode to pass queries to the server and query
    results to the client (which is the normal mode), PostgreSQL
    converts all character data passed between the client and the
    server and vice versa to the character encoding of the respective
    end; see <xref linkend="multibyte">.  This includes string
    representations of XML values, such as in the above examples.
    This would ordinarily mean that encoding declarations contained in
    XML data might become invalid as the character data is converted
    to other encodings while travelling between client and server,
    while the embedded encoding declaration is not changed.  To cope
    with this behavior, an encoding declaration contained in a
    character string presented for input to the <type>xml</type> type
    is <emphasis>ignored</emphasis>, and the content is always assumed
    to be in the current server encoding.  Consequently, for correct
    processing, such character strings of XML data must be sent off
    from the client in the current client encoding.  It is the
    responsibility of the client to either convert the document to the
    current client encoding before sending it off to the server or to
    adjust the client encoding appropriately.  On output, values of
    type <type>xml</type> will not have an encoding declaration, and
    clients must assume that the data is in the current client
    encoding.
   </para>
   <para>
    When using the binary mode to pass query parameters to the server
    and query results back the the client, no character set conversion
    is performed, so the situation is different.  In this case, an
    encoding declaration in the XML data will be observed, and if it
    is absent, the data will be assumed to be in UTF-8 (as required by
    the XML standard; note that PostgreSQL does not support UTF-16 at
    all).  On output, data will have an encoding declaration
    specifying the client encoding, unless the client encoding is
    UTF-8, in which case it will be omitted.
   </para>
   <para>
    Needless to say, processing XML data with PostgreSQL will be less
    error-prone and more efficient if data encoding, client encoding,
    and server encoding are the same.  Since XML data is internally
    processed in UTF-8, computations will be most efficient if the
    server encoding is also UTF-8.
   </para>
  </sect1>
 </chapter>
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml