mirror of
https://github.com/postgres/postgres.git
synced 2025-07-30 11:03:19 +03:00
Convert more charset/locale documentation to DocBook
This commit is contained in:
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.26 2000/09/12 05:37:07 thomas Exp $
|
||||
$Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.27 2000/09/30 16:58:20 petere Exp $
|
||||
|
||||
Postgres Administrator's Guide.
|
||||
Derived from postgres.sgml.
|
||||
@ -98,9 +98,9 @@ Derived from postgres.sgml.
|
||||
&intro-ag;
|
||||
&installation;
|
||||
&installw;
|
||||
&charset;
|
||||
&runtime;
|
||||
&client-auth;
|
||||
&charset;
|
||||
&manage-ag;
|
||||
&user-manag;
|
||||
&backup;
|
||||
|
@ -1,44 +1,235 @@
|
||||
<chapter id="charset">
|
||||
<title>Character Sets</title>
|
||||
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.3 2000/09/30 16:58:20 petere Exp $ -->
|
||||
|
||||
<abstract>
|
||||
<para>
|
||||
Describes the available language and character set support in
|
||||
<productname>Postgres</productname>.
|
||||
</para>
|
||||
</abstract>
|
||||
<chapter id="charset">
|
||||
<title>Localization</>
|
||||
|
||||
<abstract>
|
||||
<para>
|
||||
Describes the available localization features from the point of
|
||||
view of the administrator.
|
||||
</para>
|
||||
</abstract>
|
||||
|
||||
<para>
|
||||
<productname>Postgres</productname> supports non-ASCII character
|
||||
sets with two approaches:
|
||||
<productname>Postgres</productname> supports localization with
|
||||
three approaches:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
Using locale features in underlying
|
||||
system libraries. This allows single-byte character sets to be
|
||||
configured with a locale-specific collation order, provided that
|
||||
the underlying system supports the required locale. This
|
||||
technique supports only one character set per server, and can
|
||||
not support multi-byte character sets.
|
||||
Using the locale features of the operating system to provide
|
||||
locale-specific collation order, number formatting, and other
|
||||
aspects.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Using explicit multiple-byte character sets defined in the
|
||||
<productname>Postgres</productname> server. These character sets
|
||||
are also known to some client libraries. The number of character
|
||||
sets is fixed at the time the server is compiled, and internal
|
||||
operations such as string comparisons require expansion of each
|
||||
character into a 32-bit word.
|
||||
<productname>Postgres</productname> server to support languages
|
||||
that require more characters than will fit into a single byte,
|
||||
and to provide character set recoding between client and server.
|
||||
The number of supported character sets is fixed at the time the
|
||||
server is compiled, and internal operations such as string
|
||||
comparisons require expansion of each character into a 32-bit
|
||||
word.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Single byte character recoding provides a more light-weight
|
||||
solution for users of multiple, yet single-byte character sets.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
|
||||
<sect1 id="locale">
|
||||
<title>Locale Support</title>
|
||||
|
||||
<para>
|
||||
<firstterm>Locale</> support refers to an application respecting
|
||||
cultural preferences regarding alphabets, sorting, number
|
||||
formatting, etc. <productname>PostgreSQL</> uses the standard ISO
|
||||
C and POSIX-like locale facilities provided by the server operating
|
||||
system. For additional information refer the documentation of your
|
||||
system.
|
||||
</para>
|
||||
|
||||
<sect2>
|
||||
<title>Overview</>
|
||||
|
||||
<para>
|
||||
Locale support is not build into <productname>PostgreSQL</> by
|
||||
default; to enable it, supply the <option>--enable-locale</> option
|
||||
to the <filename>configure</> script:
|
||||
<informalexample>
|
||||
<screen>
|
||||
<prompt>$ </><userinput>./configure --enable-locale</>
|
||||
</screen>
|
||||
</informalexample>
|
||||
Locale support only affects the server; all clients are compatible
|
||||
with servers with or without locale support.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The information about which particular cultural rules to use is
|
||||
determined by standard environment variables. If you are getting
|
||||
localized behavior from other programs you probably have them set
|
||||
up already. The simplest way to set the localization information
|
||||
is the <envar>LANG</> variable, for example:
|
||||
<programlisting>
|
||||
export LANG=sv_SE
|
||||
</programlisting>
|
||||
This sets the locale to Swedish (<literal>sv</>) as spoken in
|
||||
Sweden (<literal>SE</>). Other possibilities might be
|
||||
<literal>en_US</> (U.S. English) and <literal>fr_CA</> (Canada,
|
||||
French). If more than one character set can be useful for a locale
|
||||
then the specifications look like this:
|
||||
<literal>cs_CZ.ISO8859-2</>. What locales are available under what
|
||||
names on your system depends on what was provided by the operating
|
||||
system vendor and what was installed.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Occasionally it is useful to mix rules from several locales, e.g.,
|
||||
use U.S. rules but Spanish messages. To do that a set of
|
||||
environment variables exist that override the default of
|
||||
<envar>LANG</> for a particular category:
|
||||
|
||||
<informaltable>
|
||||
<tgroup cols="2">
|
||||
<tbody>
|
||||
<row>
|
||||
<entry>LC_COLLATE</>
|
||||
<entry>String sort order</>
|
||||
</row>
|
||||
<row>
|
||||
<entry>LC_CTYPE</>
|
||||
<entry>Character classification (What is a letter? What is the upper-case equivalent of this letter?)</>
|
||||
</row>
|
||||
<row>
|
||||
<entry>LC_MESSAGES</>
|
||||
<entry>Language of messages</>
|
||||
</row>
|
||||
<row>
|
||||
<entry>LC_MONETARY</>
|
||||
<entry>Formatting of currency amounts</>
|
||||
</row>
|
||||
<row>
|
||||
<entry>LC_NUMERIC</>
|
||||
<entry>Formatting of numbers</>
|
||||
</row>
|
||||
<row>
|
||||
<entry>LC_TIME</>
|
||||
<entry>Formatting of dates and times</>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</informaltable>
|
||||
|
||||
<envar>LC_MESSAGES</> only affects the messages that come from the
|
||||
operating system, not <productname>PostgreSQL</>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
If you want the system to behave as if it had no locale support,
|
||||
use the special locale <literal>C</> or <literal>POSIX</>, or
|
||||
simply unset all locale related variables.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Once you have chosen a set of localization rules this way you must
|
||||
keep them fixed for any particular database cluster. That means
|
||||
that the locales that were active when you ran <filename>initdb</>
|
||||
must be kept the same when you start the postmaster. Otherwise,
|
||||
the changed sort order can corrupt indexes or make your data
|
||||
disappear mysteriously. It is currently not possible to change the
|
||||
locales after database initialization or to use more than one set
|
||||
of locales for a given database cluster.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Benefits</>
|
||||
|
||||
<para>
|
||||
Locale support influences in particular the following features:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
Sort order in <command>ORDER BY</> queries.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
The <function>to_char</> family of functions
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
The <literal>LIKE</> and <literal>~</> operators for pattern
|
||||
matching
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The only severe drawback of using the locale support in
|
||||
<productname>PostgreSQL</> is its speed. So use locale only if you
|
||||
actually need it.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2>
|
||||
<title>Problems</>
|
||||
|
||||
<para>
|
||||
If locale support doesn't work in spite of the explanation above,
|
||||
check that the locale support in your operating system is okay.
|
||||
To check whether a given locale is installed and functional you
|
||||
can use <application>Perl</>, for example. Perl has also support
|
||||
for locales and if a locale is broken <command>perl -v</> will
|
||||
complain something like this:
|
||||
<screen>
|
||||
<prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
|
||||
<prompt>$</> <userinput>perl -v</>
|
||||
<computeroutput>
|
||||
perl: warning: Setting locale failed.
|
||||
perl: warning: Please check that your locale settings:
|
||||
LC_ALL = (unset),
|
||||
LC_CTYPE = "not_exist",
|
||||
LANG = (unset)
|
||||
are supported and installed on your system.
|
||||
perl: warning: Falling back to the standard locale ("C").
|
||||
</computeroutput>
|
||||
</screen>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Check that your locale files are in the right location. Possible
|
||||
locations include: <filename>/usr/lib/locale</filename> (Linux,
|
||||
Solaris), <filename>/usr/share/locale</filename> (Linux),
|
||||
<filename>/usr/lib/nls/loc</filename> (DUX 4.0). Check the locale
|
||||
man page of your system if you are not sure.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The directory <filename>src/test/locale</> contains a test suite
|
||||
for <productname>PostgreSQL</>'s locale support.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
|
||||
<sect1 id="multibyte">
|
||||
<title>Multi-byte Support</title>
|
||||
<title>Multibyte Support</title>
|
||||
|
||||
<note>
|
||||
<title>Author</title>
|
||||
@ -53,7 +244,7 @@
|
||||
</note>
|
||||
|
||||
<para>
|
||||
Multi-byte (<acronym>MB</acronym>) support is intended to allow
|
||||
Multibyte (<acronym>MB</acronym>) support is intended to allow
|
||||
<productname>Postgres</productname> to handle
|
||||
multiple-byte character sets such as EUC (Extended Unix Code), Unicode and
|
||||
Mule internal code. With <acronym>MB</acronym> enabled you can use multi-byte
|
||||
@ -680,7 +871,78 @@ SET CLIENT_ENCODING = 'WIN1250';
|
||||
</procedure>
|
||||
</sect2>
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
|
||||
<sect1 id="recode">
|
||||
<title>Single-byte character set recoding</>
|
||||
<!-- formerly in README.charsets, by Josef Balatka, <balatka@email.cz> -->
|
||||
|
||||
<para>
|
||||
You can set up this feature with the <option>--enable-recode</> option
|
||||
to <filename>configure</>. This option was formerly described as
|
||||
<quote>Cyrillic recode support</> which doesn't express all its
|
||||
power. It can be used for <emphasis>any</> single-byte character
|
||||
set recoding.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This method uses a file <filename>charset.conf</> file located in
|
||||
the database directory (<envar>PGDATA</>). It's a typical
|
||||
configuration text file where spaces and newlines separate items
|
||||
and records and # specifies comments. Three keywords with the
|
||||
following syntax are recognized here:
|
||||
<synopsis>
|
||||
BaseCharset <replaceable>server_charset</>
|
||||
RecodeTable <replaceable>from_charset</> <replaceable>to_charset</> <replaceable>file_name</>
|
||||
HostCharset <replaceable>host_spec</> <replaceable>host_charset</>
|
||||
</synopsis>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<token>BaseCharset</> defines the encoding of the database server.
|
||||
All character set names are only used for mapping inside of
|
||||
<filename>charset.conf</> so you can freely use typing-friendly
|
||||
names.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<token>RecodeTable</> records specify translation tables between
|
||||
server and client. The file name is relative to the
|
||||
<envar>PGDATA</> directory. The table file format is very
|
||||
simple. There are no keywords and characters are represented by a
|
||||
pair of decimal or hexadecimal (0x prefixed) values on single
|
||||
lines:
|
||||
<synopsis>
|
||||
<replaceable>char_value</> <replaceable>translated_char_value</>
|
||||
</synopsis>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<token>HostCharset</> records define the client character set by IP
|
||||
address. You can use a single IP address, an IP mask range starting
|
||||
from the given address or an IP interval (e.g., 127.0.0.1,
|
||||
192.168.1.100/24, 192.168.1.20-192.168.1.40).
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <filename>charset.conf</> file is always processed up to the
|
||||
end, so you can easily specify exceptions from the previous
|
||||
rules. In the src/data you will find charset.conf example and a few
|
||||
recoding tables.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
As this solution is based on the client's IP address and character
|
||||
set mapping there are obviously some restrictions as well. You
|
||||
cannot use different encodings on the same host at the same
|
||||
time. It is also inconvenient when you boot your client hosts into
|
||||
more operating systems. Nevertheless, when these restrictions are
|
||||
not limiting and you do not need multi-byte characters than it is a
|
||||
simple and effective solution.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
||||
|
||||
<!-- Keep this comment at the end of the file
|
||||
Local variables:
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.21 2000/09/29 20:21:34 petere Exp $ -->
|
||||
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.22 2000/09/30 16:58:20 petere Exp $ -->
|
||||
|
||||
<chapter id="installation">
|
||||
<title><![%flattext-install-include[<productname>PostgreSQL</> ]]>Installation Instructions</title>
|
||||
@ -447,8 +447,9 @@ su - postgres
|
||||
<term>--enable-recode</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Enables character set recode support. See
|
||||
<filename>doc/README.Charsets</> for details on this feature.
|
||||
Enables single-byte character set recode support. See
|
||||
<![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]>
|
||||
<![%flattext-install-ignore[<xref linkend="recode">]]> about this feature.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
@ -459,7 +460,10 @@ su - postgres
|
||||
<para>
|
||||
Allows the use of multibyte character encodings. This is
|
||||
primarily for languages like Japanese, Korean, and Chinese.
|
||||
Read <filename>doc/README.mb</> for details.
|
||||
Read
|
||||
<![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]>
|
||||
<![%flattext-install-ignore[<xref linkend="multibyte">]]>
|
||||
for details.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 thomas Exp $
|
||||
$Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.42 2000/09/30 16:58:20 petere Exp $
|
||||
-->
|
||||
|
||||
<!doctype set PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [
|
||||
@ -173,9 +173,9 @@ $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 th
|
||||
-->
|
||||
&installation;
|
||||
&installw;
|
||||
&charset;
|
||||
&runtime;
|
||||
&client-auth;
|
||||
&charset;
|
||||
&manage-ag;
|
||||
&user-manag;
|
||||
&backup;
|
||||
|
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.25 2000/09/29 20:21:34 petere Exp $
|
||||
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.26 2000/09/30 16:58:20 petere Exp $
|
||||
-->
|
||||
|
||||
<Chapter Id="runtime">
|
||||
@ -1553,126 +1553,6 @@ set semsys:seminfo_semmsl=32
|
||||
</sect1>
|
||||
|
||||
|
||||
<sect1 id="locale">
|
||||
<title>Locale Support</title>
|
||||
|
||||
<note>
|
||||
<title>Acknowledgement</title>
|
||||
<para>
|
||||
Written by Oleg Bartunov. See <ulink
|
||||
url="http://www.sai.msu.su/~megera/postgres/">Oleg's web
|
||||
page</ulink> for additional information on locale and Russian
|
||||
language support.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<para>
|
||||
While doing a project for a company in Moscow, Russia, I
|
||||
encountered the problem that <productname>Postgres</> had no
|
||||
support of national alphabets. After looking for possible
|
||||
workarounds I decided to develop support of locale myself. I'm not
|
||||
a C programmer but already had some experience with locale
|
||||
programming when I work with <productname>Perl</> (debugging) and
|
||||
<productname>Glimpse</>. After several days of digging through the
|
||||
<productname>Postgres</> source tree I made very minor corections
|
||||
to <filename>src/backend/utils/adt/varlena.c</> and
|
||||
<filename>src/backend/main/main.c</> and got what I needed! I did
|
||||
support only for <envar>LC_CTYPE</envar> and
|
||||
<envar>LC_COLLATE</envar>, but later <envar>LC_MONETARY</envar> was
|
||||
added by others. I got many messages from people about this patch
|
||||
so I decided to send it to developers and (to my surprise) it was
|
||||
incorporated into the <productname>Postgres</> distribution.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
People often complain that locale doesn't work for them. There are
|
||||
several common mistakes:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
Didn't properly configure <productname>Postgres</> before
|
||||
compilation. You must run <filename>configure</> with the
|
||||
<option>--enable-locale</> option to enable locale support.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Didn't setup environment correctly when starting postmaster. You
|
||||
must define environment variables <envar>LC_CTYPE</envar> and
|
||||
<envar>LC_COLLATE</envar> before running postmaster because
|
||||
backend gets information about locale from environment. I use
|
||||
following shell script:
|
||||
<programlisting>
|
||||
#!/bin/sh
|
||||
|
||||
export LC_CTYPE=koi8-r
|
||||
export LC_COLLATE=koi8-r
|
||||
postmaster -B 1024 -S -D/usr/local/pgsql/data/ -o '-Fe'
|
||||
</programlisting>
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Broken locale support in the operating system (for example,
|
||||
locale support in libc under Linux several times has changed and
|
||||
this caused a lot of problems). Perl has also support of locale
|
||||
and if locale is broken <command>perl -v</> will complain
|
||||
something like:
|
||||
<screen>
|
||||
<prompt>$</> <userinput>export LC_CTYPE='not_exist'</>
|
||||
<prompt>$</> <userinput>perl -v</>
|
||||
<computeroutput>
|
||||
perl: warning: Setting locale failed.
|
||||
perl: warning: Please check that your locale settings:
|
||||
LC_ALL = (unset),
|
||||
LC_CTYPE = "not_exist",
|
||||
LANG = (unset)
|
||||
are supported and installed on your system.
|
||||
perl: warning: Falling back to the standard locale ("C").
|
||||
</computeroutput>
|
||||
</screen>
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Wrong location of locale files. Possible locations include:
|
||||
<filename>/usr/lib/locale</filename> (Linux, Solaris),
|
||||
<filename>/usr/share/locale</filename> (Linux),
|
||||
<filename>/usr/lib/nls/loc</filename> (DUX 4.0).
|
||||
|
||||
Check <command>man locale</command> to find the correct
|
||||
location. Under Linux I made a symbolic link between
|
||||
<filename>/usr/lib/locale</filename> and
|
||||
<filename>/usr/share/locale</filename> to be sure that the next
|
||||
libc will not break my locale.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<formalpara>
|
||||
<title>What are the Benefits?</title>
|
||||
<para>
|
||||
You can use ~* and order by operators for strings contain
|
||||
characters from national alphabets. Non-english users definitely
|
||||
need that.
|
||||
</para>
|
||||
</formalpara>
|
||||
|
||||
<formalpara>
|
||||
<title>What are the Drawbacks?</title>
|
||||
<para>
|
||||
There is one evident drawback of using locale - its speed! So, use
|
||||
locale only if you really need it.
|
||||
</para>
|
||||
</formalpara>
|
||||
</sect1>
|
||||
|
||||
|
||||
<sect1 id="postmaster-shutdown">
|
||||
<title>Shutting down the server</title>
|
||||
|
||||
|
Reference in New Issue
Block a user