Convert more charset/locale documentation to DocBook

2025-10-16 17:07:43 +03:00 · 2000-09-30 16:58:20 +00:00
parent 333cbc2dab
commit 0ba77c14aa
7 changed files with 299 additions and 373 deletions
--- a/doc/src/sgml/admin.sgml
+++ b/doc/src/sgml/admin.sgml
@@ -1,5 +1,5 @@
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.26 2000/09/12 05:37:07 thomas Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.27 2000/09/30 16:58:20 petere Exp $

 Postgres Administrator's Guide.
 Derived from postgres.sgml.
@@ -98,9 +98,9 @@ Derived from postgres.sgml.
  &intro-ag;
  &installation;
  &installw;
-  &charset;
  &runtime;
  &client-auth;
+  &charset;
  &manage-ag;
  &user-manag;
  &backup;
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -1,44 +1,235 @@
- <chapter id="charset">
-  <title>Character Sets</title>
+<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.3 2000/09/30 16:58:20 petere Exp $ -->

-  <abstract>
-   <para>
-    Describes the available language and character set support in
-    <productname>Postgres</productname>.
-   </para>
-  </abstract>
+<chapter id="charset">
+ <title>Localization</>
+
+ <abstract>
+  <para>
+   Describes the available localization features from the point of
+   view of the administrator.
+  </para>
+ </abstract>

  <para>
-   <productname>Postgres</productname> supports non-ASCII character
-   sets with two approaches: 
+   <productname>Postgres</productname> supports localization with
+   three approaches:

   <itemizedlist>
    <listitem>
     <para>
-      Using locale features in underlying
-      system libraries. This allows single-byte character sets to be
-      configured with a locale-specific collation order, provided that
-      the underlying system supports the required locale. This
-      technique supports only one character set per server, and can
-      not support multi-byte character sets.
+      Using the locale features of the operating system to provide
+      locale-specific collation order, number formatting, and other
+      aspects.
     </para>
    </listitem>

    <listitem>
     <para>
      Using explicit multiple-byte character sets defined in the
-      <productname>Postgres</productname> server. These character sets
-      are also known to some client libraries. The number of character
-      sets is fixed at the time the server is compiled, and internal
-      operations such as string comparisons require expansion of each
-      character into a 32-bit word.
+      <productname>Postgres</productname> server to support languages
+      that require more characters than will fit into a single byte,
+      and to provide character set recoding between client and server.
+      The number of supported character sets is fixed at the time the
+      server is compiled, and internal operations such as string
+      comparisons require expansion of each character into a 32-bit
+      word.
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
+      Single byte character recoding provides a more light-weight
+      solution for users of multiple, yet single-byte character sets.
     </para>
    </listitem>
   </itemizedlist>
  </para>

+
+ <sect1 id="locale">
+  <title>Locale Support</title>
+  
+  <para>
+   <firstterm>Locale</> support refers to an application respecting
+   cultural preferences regarding alphabets, sorting, number
+   formatting, etc.  <productname>PostgreSQL</> uses the standard ISO
+   C and POSIX-like locale facilities provided by the server operating
+   system.  For additional information refer the documentation of your
+   system.
+  </para>
+
+  <sect2>
+   <title>Overview</>
+
+  <para>
+    Locale support is not build into <productname>PostgreSQL</> by
+    default; to enable it, supply the <option>--enable-locale</> option
+    to the <filename>configure</> script:
+<informalexample>
+<screen>
+<prompt>$ </><userinput>./configure --enable-locale</>
+</screen>
+</informalexample>
+    Locale support only affects the server; all clients are compatible
+    with servers with or without locale support.
+   </para>
+
+   <para>
+    The information about which particular cultural rules to use is
+    determined by standard environment variables.  If you are getting
+    localized behavior from other programs you probably have them set
+    up already.  The simplest way to set the localization information
+    is the <envar>LANG</> variable, for example:
+<programlisting>
+export LANG=sv_SE
+</programlisting>
+    This sets the locale to Swedish (<literal>sv</>) as spoken in
+    Sweden (<literal>SE</>).  Other possibilities might be
+    <literal>en_US</> (U.S. English) and <literal>fr_CA</> (Canada,
+    French).  If more than one character set can be useful for a locale
+    then the specifications look like this:
+    <literal>cs_CZ.ISO8859-2</>. What locales are available under what
+    names on your system depends on what was provided by the operating
+    system vendor and what was installed.
+   </para>
+
+   <para>
+    Occasionally it is useful to mix rules from several locales, e.g.,
+    use U.S. rules but Spanish messages.  To do that a set of
+    environment variables exist that override the default of
+    <envar>LANG</> for a particular category:
+
+    <informaltable>
+     <tgroup cols="2">
+      <tbody>
+       <row>
+        <entry>LC_COLLATE</>
+        <entry>String sort order</>
+       </row>
+       <row>
+        <entry>LC_CTYPE</>
+        <entry>Character classification (What is a letter? What is the upper-case equivalent of this letter?)</>
+       </row>
+       <row>
+        <entry>LC_MESSAGES</>
+        <entry>Language of messages</>
+       </row>
+       <row>
+        <entry>LC_MONETARY</>
+        <entry>Formatting of currency amounts</>
+       </row>
+       <row>
+        <entry>LC_NUMERIC</>
+        <entry>Formatting of numbers</>
+       </row>
+       <row>
+        <entry>LC_TIME</>
+        <entry>Formatting of dates and times</>
+       </row>
+      </tbody>
+     </tgroup>
+    </informaltable>
+
+    <envar>LC_MESSAGES</> only affects the messages that come from the
+    operating system, not <productname>PostgreSQL</>.
+   </para>
+
+   <para>
+    If you want the system to behave as if it had no locale support,
+    use the special locale <literal>C</> or <literal>POSIX</>, or
+    simply unset all locale related variables.
+   </para>
+
+   <para>
+    Once you have chosen a set of localization rules this way you must
+    keep them fixed for any particular database cluster.  That means
+    that the locales that were active when you ran <filename>initdb</>
+    must be kept the same when you start the postmaster.  Otherwise,
+    the changed sort order can corrupt indexes or make your data
+    disappear mysteriously.  It is currently not possible to change the
+    locales after database initialization or to use more than one set
+    of locales for a given database cluster.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Benefits</>
+
+   <para>
+    Locale support influences in particular the following features:
+
+    <itemizedlist>
+     <listitem>
+      <para>
+       Sort order in <command>ORDER BY</> queries.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The <function>to_char</> family of functions
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       The <literal>LIKE</> and <literal>~</> operators for pattern
+       matching
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+    The only severe drawback of using the locale support in
+    <productname>PostgreSQL</> is its speed.  So use locale only if you
+    actually need it.
+   </para>
+  </sect2>
+
+  <sect2>
+   <title>Problems</>
+
+   <para>
+    If locale support doesn't work in spite of the explanation above,
+    check that the locale support in your operating system is okay.
+    To check whether a given locale is installed and functional you
+    can use <application>Perl</>, for example.  Perl has also support
+    for locales and if a locale is broken <command>perl -v</> will
+    complain something like this:
+<screen>
+<prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
+<prompt>$</> <userinput>perl -v</>
+<computeroutput>
+perl: warning: Setting locale failed.
+perl: warning: Please check that your locale settings:
+LC_ALL = (unset),
+LC_CTYPE = "not_exist",
+LANG = (unset)
+are supported and installed on your system.
+perl: warning: Falling back to the standard locale ("C").
+</computeroutput>
+</screen>
+   </para>
+
+   <para>
+    Check that your locale files are in the right location.  Possible
+    locations include: <filename>/usr/lib/locale</filename> (Linux,
+    Solaris), <filename>/usr/share/locale</filename> (Linux),
+    <filename>/usr/lib/nls/loc</filename> (DUX 4.0).  Check the locale
+    man page of your system if you are not sure.
+   </para>
+
+   <para>
+    The directory <filename>src/test/locale</> contains a test suite
+    for <productname>PostgreSQL</>'s locale support.
+   </para>
+  </sect2>
+ </sect1>
+
+
  <sect1 id="multibyte">
-   <title>Multi-byte Support</title>
+   <title>Multibyte Support</title>

   <note>
    <title>Author</title>
@@ -53,7 +244,7 @@
   </note>

   <para>
-    Multi-byte (<acronym>MB</acronym>) support is intended to allow
+    Multibyte (<acronym>MB</acronym>) support is intended to allow
    <productname>Postgres</productname> to handle
    multiple-byte character sets such as EUC (Extended Unix Code), Unicode and
    Mule internal code. With <acronym>MB</acronym> enabled you can use multi-byte
@@ -680,7 +871,78 @@ SET CLIENT_ENCODING = 'WIN1250';
    </procedure>
   </sect2>
  </sect1>
- </chapter>
+
+
+ <sect1 id="recode">
+  <title>Single-byte character set recoding</>
+<!-- formerly in README.charsets, by Josef Balatka, <balatka@email.cz> -->
+
+  <para>
+   You can set up this feature with the <option>--enable-recode</> option
+   to <filename>configure</>. This option was formerly described as
+   <quote>Cyrillic recode support</> which doesn't express all its
+   power. It can be used for <emphasis>any</> single-byte character
+   set recoding.
+  </para>
+
+  <para>
+   This method uses a file <filename>charset.conf</> file located in
+   the database directory (<envar>PGDATA</>).  It's a typical
+   configuration text file where spaces and newlines separate items
+   and records and # specifies comments.  Three keywords with the
+   following syntax are recognized here:
+<synopsis>
+BaseCharset      <replaceable>server_charset</>
+RecodeTable      <replaceable>from_charset</> <replaceable>to_charset</> <replaceable>file_name</>
+HostCharset      <replaceable>host_spec</>    <replaceable>host_charset</>
+</synopsis>
+  </para>
+
+  <para>
+   <token>BaseCharset</> defines the encoding of the database server.
+   All character set names are only used for mapping inside of
+   <filename>charset.conf</> so you can freely use typing-friendly
+   names.
+  </para>
+
+  <para>
+   <token>RecodeTable</> records specify translation tables between
+   server and client.  The file name is relative to the
+   <envar>PGDATA</> directory.  The table file format is very
+   simple. There are no keywords and characters are represented by a
+   pair of decimal or hexadecimal (0x prefixed) values on single
+   lines:
+<synopsis>
+<replaceable>char_value</>   <replaceable>translated_char_value</>
+</synopsis>
+  </para>
+
+  <para>
+   <token>HostCharset</> records define the client character set by IP
+   address. You can use a single IP address, an IP mask range starting
+   from the given address or an IP interval (e.g., 127.0.0.1,
+   192.168.1.100/24, 192.168.1.20-192.168.1.40).
+  </para>
+
+  <para>
+   The <filename>charset.conf</> file is always processed up to the
+   end, so you can easily specify exceptions from the previous
+   rules. In the src/data you will find charset.conf example and a few
+   recoding tables.
+  </para>
+
+  <para>
+   As this solution is based on the client's IP address and character
+   set mapping there are obviously some restrictions as well. You
+   cannot use different encodings on the same host at the same
+   time. It is also inconvenient when you boot your client hosts into
+   more operating systems.  Nevertheless, when these restrictions are
+   not limiting and you do not need multi-byte characters than it is a
+   simple and effective solution.
+  </para>
+ </sect1>
+
+</chapter>

 <!-- Keep this comment at the end of the file
 Local variables:
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -1,4 +1,4 @@
-<!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.21 2000/09/29 20:21:34 petere Exp $ -->
+<!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.22 2000/09/30 16:58:20 petere Exp $ -->

 <chapter id="installation">
 <title><![%flattext-install-include[<productname>PostgreSQL</> ]]>Installation Instructions</title>
@@ -447,8 +447,9 @@ su - postgres
       <term>--enable-recode</term>
       <listitem>
        <para>
-         Enables character set recode support. See
-         <filename>doc/README.Charsets</> for details on this feature.
+         Enables single-byte character set recode support. See
+         <![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]>
+         <![%flattext-install-ignore[<xref linkend="recode">]]> about this feature.
        </para>
       </listitem>
      </varlistentry>
@@ -459,7 +460,10 @@ su - postgres
        <para>
         Allows the use of multibyte character encodings. This is
         primarily for languages like Japanese, Korean, and Chinese.
-         Read <filename>doc/README.mb</> for details.
+         Read 
+         <![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]>
+         <![%flattext-install-ignore[<xref linkend="multibyte">]]>
+         for details.
        </para>
       </listitem>
      </varlistentry>
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -1,5 +1,5 @@
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 thomas Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.42 2000/09/30 16:58:20 petere Exp $
 -->

 <!doctype set PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
@@ -173,9 +173,9 @@ $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 th
 -->
  &installation;
  &installw;
-  &charset;
  &runtime;
  &client-auth;
+  &charset;
  &manage-ag;
  &user-manag;
  &backup;
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1,5 +1,5 @@
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.25 2000/09/29 20:21:34 petere Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.26 2000/09/30 16:58:20 petere Exp $
 -->

 <Chapter Id="runtime">
@@ -1553,126 +1553,6 @@ set semsys:seminfo_semmsl=32
 </sect1>


- <sect1 id="locale">
-  <title>Locale Support</title>
-  
-  <note>
-   <title>Acknowledgement</title>
-   <para>
-    Written by Oleg Bartunov. See <ulink
-    url="http://www.sai.msu.su/~megera/postgres/">Oleg's web
-    page</ulink> for additional information on locale and Russian
-    language support.
-   </para>
-  </note>
-
-  <para>
-   While doing a project for a company in Moscow, Russia, I
-   encountered the problem that <productname>Postgres</> had no
-   support of national alphabets. After looking for possible
-   workarounds I decided to develop support of locale myself. I'm not
-   a C programmer but already had some experience with locale
-   programming when I work with <productname>Perl</> (debugging) and
-   <productname>Glimpse</>. After several days of digging through the
-   <productname>Postgres</> source tree I made very minor corections
-   to <filename>src/backend/utils/adt/varlena.c</> and
-   <filename>src/backend/main/main.c</> and got what I needed! I did
-   support only for <envar>LC_CTYPE</envar> and
-   <envar>LC_COLLATE</envar>, but later <envar>LC_MONETARY</envar> was
-   added by others. I got many messages from people about this patch
-   so I decided to send it to developers and (to my surprise) it was
-   incorporated into the <productname>Postgres</> distribution.
-  </para>
-
-  <para>
-   People often complain that locale doesn't work for them. There are
-   several common mistakes:
-   
-   <itemizedlist>
-    <listitem>
-     <para>
-      Didn't properly configure <productname>Postgres</> before
-      compilation. You must run <filename>configure</> with the
-      <option>--enable-locale</> option to enable locale support.
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      Didn't setup environment correctly when starting postmaster. You
-      must define environment variables <envar>LC_CTYPE</envar> and
-      <envar>LC_COLLATE</envar> before running postmaster because
-      backend gets information about locale from environment. I use
-      following shell script:
-<programlisting>
-#!/bin/sh
-
-export LC_CTYPE=koi8-r
-export LC_COLLATE=koi8-r
-postmaster -B 1024 -S -D/usr/local/pgsql/data/ -o '-Fe'
-</programlisting>
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      Broken locale support in the operating system (for example,
-      locale support in libc under Linux several times has changed and
-      this caused a lot of problems). Perl has also support of locale
-      and if locale is broken <command>perl -v</> will complain
-      something like:
-<screen>
-<prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
-<prompt>$</> <userinput>perl -v</>
-<computeroutput>
-perl: warning: Setting locale failed.
-perl: warning: Please check that your locale settings:
-LC_ALL = (unset),
-LC_CTYPE = "not_exist",
-LANG = (unset)
-are supported and installed on your system.
-perl: warning: Falling back to the standard locale ("C").
-</computeroutput>
-</screen>
-     </para>
-    </listitem>
-
-    <listitem>
-     <para>
-      Wrong location of locale files. Possible locations include:
-      <filename>/usr/lib/locale</filename> (Linux, Solaris),
-      <filename>/usr/share/locale</filename> (Linux),
-      <filename>/usr/lib/nls/loc</filename> (DUX 4.0).
-      
-      Check <command>man locale</command> to find the correct
-      location. Under Linux I made a symbolic link between
-      <filename>/usr/lib/locale</filename> and
-      <filename>/usr/share/locale</filename> to be sure that the next
-      libc will not break my locale.
-     </para>
-    </listitem>
-   </itemizedlist>
-  </para>
-
-  <formalpara>
-   <title>What are the Benefits?</title> 
-   <para>
-    You can use ~* and order by operators for strings contain
-    characters from national alphabets. Non-english users definitely
-    need that.
-   </para>
-  </formalpara>
-
-  <formalpara>
-   <title>What are the Drawbacks?</title>
-   <para>
-    There is one evident drawback of using locale - its speed! So, use
-    locale only if you really need it.
-   </para>
-  </formalpara>
- </sect1>
-
-
 <sect1 id="postmaster-shutdown">
  <title>Shutting down the server</title>