mirror of
				https://github.com/postgres/postgres.git
				synced 2025-10-25 13:17:41 +03:00 
			
		
		
		
	Convert more charset/locale documentation to DocBook
This commit is contained in:
		| @@ -1,113 +0,0 @@ | |||||||
|    |  | ||||||
|   PostgreSQL Charsets README |  | ||||||
|   Josef Balatka, <balatka@email.cz> |  | ||||||
|   Draft v0.1, Tue Jul 20 15:49:07 CEST 1999 |  | ||||||
|    |  | ||||||
|   This document is a brief overview of the national charsets support |  | ||||||
|   that PostgreSQL ver. 6.5 has implemented. Various compilation options |  | ||||||
|   and setup tips are mentioned here to be helpful in the particular use. |  | ||||||
|    |  | ||||||
|   --------------------------------------------------------------------------- |  | ||||||
|    |  | ||||||
|   Table of Contents |  | ||||||
|    |  | ||||||
|   1. Locale awareness |  | ||||||
|    |  | ||||||
|   2. Single-byte charsets recoding |  | ||||||
|    |  | ||||||
|   3. Multi-byte support/recoding |  | ||||||
|    |  | ||||||
|   4. Credits |  | ||||||
|    |  | ||||||
|   --------------------------------------------------------------------------- |  | ||||||
|    |  | ||||||
|   1. Locale awareness |  | ||||||
|    |  | ||||||
|      PostgreSQL server supports both locale aware and locale not aware |  | ||||||
|      (default) operational modes. You can determine this mode during the |  | ||||||
|      configuration stage of the installation with --enable-locale option. |  | ||||||
|    |  | ||||||
|      If you don't use --enable-locale, the multi-language code will not be |  | ||||||
|      compiled and PostgreSQL will behave as an ASCII compliant application. |  | ||||||
|      This mode is useful for its speed but only provided that you don't |  | ||||||
|      have to consider national specific chars. |  | ||||||
|  |  | ||||||
|      With --enable-locale you will get a locale aware server using LC_* |  | ||||||
|      environment variables to determine how to process national specifics. |  | ||||||
|      In this case strcoll(3) and similar functions are used internally |  | ||||||
|      so speed is somewhat lower. |  | ||||||
|    |  | ||||||
|      Notice here that --enable-locale is sufficient when all your clients |  | ||||||
|      use the same single-byte encoding as the database server does. |  | ||||||
|    |  | ||||||
|      When your clients use encoding different from the server than you have |  | ||||||
|      to use, moreover, --enable-recode or --with-mb=<encoding> options on |  | ||||||
|      the server side or a particular client that does recoding itself (e.g. |  | ||||||
|      there exists a PostgreSQL ODBC driver for Win32 with various Cyrillic |  | ||||||
|      encoding capability). Option --with-mb=<encoding> is necessary for the |  | ||||||
|      multi-byte charsets support. |  | ||||||
|    |  | ||||||
|    |  | ||||||
|   2. Single-byte charsets recoding |  | ||||||
|    |  | ||||||
|      You can set up this feature with --enable-recode option. This option |  | ||||||
|      is described as 'enable Cyrillic recode support' which doesn't express |  | ||||||
|      all its power. It can be used for *any* single-byte charset recoding. |  | ||||||
|    |  | ||||||
|      This method uses charset.conf file located in the $PGDATA directory. |  | ||||||
|      It's a typical configuration text file where spaces and newlines |  | ||||||
|      separate items and records and # specifies comments. Three keywords |  | ||||||
|      with the following syntax are recognized here: |  | ||||||
|    |  | ||||||
|        BaseCharset	<server_charset> |  | ||||||
|        RecodeTable	<from_charset>     <to_charset>    <file_name> |  | ||||||
|        HostCharset	<host_spec>	   <host_charset> |  | ||||||
|    |  | ||||||
|      BaseCharset defines encoding of the database server. All charset |  | ||||||
|      names are only used for mapping inside the charset.conf so you can |  | ||||||
|      freely use typing-friendly names. |  | ||||||
|       |  | ||||||
|      RecodeTable records specify translation table between server and client. |  | ||||||
|      The file name is relative to the $PGDATA directory. Table file format |  | ||||||
|      is very simple. There are no keywords and characters are represented by |  | ||||||
|      a pair of decimal or hexadecimal (0x prefixed) values on single lines: |  | ||||||
|    |  | ||||||
|        <char_value>  <translated_char_value> |  | ||||||
|    |  | ||||||
|      HostCharset records define IP address and charset. You can use a single |  | ||||||
|      IP address, an IP mask range starting from the given address or an IP |  | ||||||
|      interval (e.g. 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40) |  | ||||||
|    |  | ||||||
|      The charset.conf is always processed up to the end, so you can easily |  | ||||||
|      specify exceptions from the previous rules. In the src/data you will |  | ||||||
|      find charset.conf example and a few recoding tables. |  | ||||||
|    |  | ||||||
|      As this solution is based on the client's IP address / charset mapping |  | ||||||
|      there are obviously some restrictions as well. You can't use different |  | ||||||
|      encoding on the same host at the same time. It's also inconvenient when |  | ||||||
|      you boot your client hosts into more operating systems. |  | ||||||
|      Nevertheless, when these restrictions are not limiting and you don't |  | ||||||
|      need multi-byte chars than it's a simple and effective solution. |  | ||||||
|    |  | ||||||
|    |  | ||||||
|   3. Multi-byte support/recoding |  | ||||||
|    |  | ||||||
|      It's a new generation of charset encoding in PostgreSQL designed as a |  | ||||||
|      more complex solution supporting both single-byte and multi-byte chars. |  | ||||||
|      You can set up this feature with --with-mb=<encoding> option. |  | ||||||
|    |  | ||||||
|      There is no IP mapping file and recoding is controlled through the new |  | ||||||
|      SQL statements. Recoding tables are included in the code. Many national |  | ||||||
|      charsets are already supported and further will follow. |  | ||||||
|    |  | ||||||
|      See doc/README.mb, doc/README.mb.jp to get detailed instruction on how |  | ||||||
|      to use the multibyte support. In the file doc/README.locale there is |  | ||||||
|      a particular instruction on usage of the multibyte support with Cyrillic. |  | ||||||
|    |  | ||||||
|    |  | ||||||
|   4. Credits |  | ||||||
|    |  | ||||||
|      I'd like to thank the PostgreSQL development team and all contributors |  | ||||||
|      for creating PostgreSQL. Thanks to Oleg Bartunov, Oleg Broytmann and |  | ||||||
|      Tatsuo Ishii for opening the door into the multi-language world. |  | ||||||
|    |  | ||||||
| @@ -1,107 +0,0 @@ | |||||||
| =========== |  | ||||||
| 1999 Jul 21 |  | ||||||
| =========== |  | ||||||
|  |  | ||||||
|    Josef Balatka, <balatka@email.cz> asked us not to remove RECODE and sent me |  | ||||||
| Czech ISO-8859-2 -> WIN-1250 translation table. |  | ||||||
|    RECODE is no longer contains just Cyrillic RECODE and will stay in  |  | ||||||
| PostgreSQL. |  | ||||||
|  |  | ||||||
|    He also created some bits of documentation, mostly concerning RECODE - |  | ||||||
| see README.Charsets. |  | ||||||
|  |  | ||||||
|  |  | ||||||
| =========== |  | ||||||
| 1999 Apr 14 |  | ||||||
| =========== |  | ||||||
|  |  | ||||||
|    Tatsuo Ishii <t-ishii@sra.co.jp> updated Multibyte support extending it |  | ||||||
| to Cyrillic language. Now PostgreSQL supports KOI8-R, WIN-1251, ISO8859-5 |  | ||||||
| and CP866 (ALT) encodings. |  | ||||||
|  |  | ||||||
|    Short instruction on using this feature follows. Longer discussion of |  | ||||||
| Multibyte support is in README.mb. |  | ||||||
|  |  | ||||||
|    WARNING! Now with Multibyte support Cyrillic RECODE declared obsolete |  | ||||||
| and will be removed from Postgres. If you are using RECODE consider |  | ||||||
| switching to Multibyte support. |  | ||||||
|  |  | ||||||
|    Instructions on how to prepare Postgres for Cyrillic Multibyte support. |  | ||||||
|    ---------------------------------------------------------------------- |  | ||||||
|  |  | ||||||
|    First, you need to backup all your databases. I recommend to backup the |  | ||||||
| entire Postgres directory, including binaries and libraries - thus you can |  | ||||||
| easily restore if something goes wrong. |  | ||||||
|  |  | ||||||
|    Dump you data: pg_dumpall > dump.db |  | ||||||
|  |  | ||||||
|    Stop postmaster. |  | ||||||
|  |  | ||||||
|    Configure, compile and install Postgres. (I'll mostly talk about KOI8-R |  | ||||||
| encoding, this is just to make examples a little more clear; you can use |  | ||||||
| any supported encoding.) |  | ||||||
|  |  | ||||||
|    cd src |  | ||||||
|    ./configure --enable-locale --with-mb=KOI8 |  | ||||||
|    make |  | ||||||
|    make install |  | ||||||
|  |  | ||||||
|    Make sure you've backed up your databases. Doublecheck your backup. I |  | ||||||
| really mean it - make regular backups and test your backups sometimes by |  | ||||||
| fake restore. |  | ||||||
|  |  | ||||||
|    Remove your data directory (better, rename or move it). |  | ||||||
|  |  | ||||||
|    Run initdb saying your primary encoding: initdb -e KOI8. If you omit |  | ||||||
| encoding, primary encoding from configure will be taken. |  | ||||||
|  |  | ||||||
|    Start postmaster. |  | ||||||
|  |  | ||||||
|    Create databases: createdb -e KOI8. Again, you can omit encoding - |  | ||||||
| default encoding will be used. You are not forced to use the same encoding |  | ||||||
| for all your databases - you can create different databases with different |  | ||||||
| encodings. |  | ||||||
|  |  | ||||||
|    Load your data from the dump you've created: psql < dump.db |  | ||||||
|  |  | ||||||
|    That's all! Now you are ready to enjoy the full power of Multibyte |  | ||||||
| support. |  | ||||||
|  |  | ||||||
|    To use Multibyte support you do not need to do something special - just |  | ||||||
| execute your queries. If client program does not set encoding, it will get |  | ||||||
| the data in database encoding. But client may ask Postgres to do automatic |  | ||||||
| server-to-client and client-to-server conversions. There are 2 (two) ways |  | ||||||
| client program declares its encoding: |  | ||||||
|    1) client explicitly executes the query SET CLIENT_ENCODING TO 'win'; |  | ||||||
|    2) client started with environment variable set. Examples - |  | ||||||
| using sh syntax: |  | ||||||
|    PGCLIENTENCODING='win'; export PGCLIENTENCODING |  | ||||||
| using csh syntax: |  | ||||||
|    setenv PGCLIENTENCODING 'win' |  | ||||||
|  |  | ||||||
|    Setting PGCLIENTENCODING even if you use same client encding as the |  | ||||||
| database would omit an overhead of asking the database encoding while |  | ||||||
| initiating the connection, so it is good idea to set it in any case. |  | ||||||
|  |  | ||||||
|    Now you may run test suite and see Multibyte support in action. Go to |  | ||||||
| .../src/test/locale and run |  | ||||||
|    make clean all test-koi2win |  | ||||||
|  |  | ||||||
|  |  | ||||||
| =========== |  | ||||||
| 1998 Nov 20 |  | ||||||
| =========== |  | ||||||
|  |  | ||||||
|    I extended locale support, originally written by Oleg Bartunov |  | ||||||
| <oleg@sai.msu.su>. Now ORDER BY (if PostgreSQL configured with |  | ||||||
| --enable-locale) uses strcoll() for all text fields: char(n), varchar(n), |  | ||||||
| text. |  | ||||||
|  |  | ||||||
|    I included test suite .../src/test/locale. I didn't include this in |  | ||||||
| the regression test because not so much people require locale support. Read |  | ||||||
| .../src/test/locale/README for details on the test suite. |  | ||||||
|  |  | ||||||
|    Many thanks to Oleg Bartunov (oleg@sai.msu.su) and Thomas G. Lockhart |  | ||||||
| (lockhart@alumni.caltech.edu) for hints, tips, help and discussion. |  | ||||||
|  |  | ||||||
| Oleg. |  | ||||||
| @@ -1,5 +1,5 @@ | |||||||
| <!-- | <!-- | ||||||
| $Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.26 2000/09/12 05:37:07 thomas Exp $ | $Header: /cvsroot/pgsql/doc/src/sgml/Attic/admin.sgml,v 1.27 2000/09/30 16:58:20 petere Exp $ | ||||||
|  |  | ||||||
| Postgres Administrator's Guide. | Postgres Administrator's Guide. | ||||||
| Derived from postgres.sgml. | Derived from postgres.sgml. | ||||||
| @@ -98,9 +98,9 @@ Derived from postgres.sgml. | |||||||
|   &intro-ag; |   &intro-ag; | ||||||
|   &installation; |   &installation; | ||||||
|   &installw; |   &installw; | ||||||
|   &charset; |  | ||||||
|   &runtime; |   &runtime; | ||||||
|   &client-auth; |   &client-auth; | ||||||
|  |   &charset; | ||||||
|   &manage-ag; |   &manage-ag; | ||||||
|   &user-manag; |   &user-manag; | ||||||
|   &backup; |   &backup; | ||||||
|   | |||||||
| @@ -1,44 +1,235 @@ | |||||||
|  <chapter id="charset"> | <!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.3 2000/09/30 16:58:20 petere Exp $ --> | ||||||
|   <title>Character Sets</title> |  | ||||||
|  | <chapter id="charset"> | ||||||
|  |  <title>Localization</> | ||||||
|  |  | ||||||
|  <abstract> |  <abstract> | ||||||
|   <para> |   <para> | ||||||
|     Describes the available language and character set support in |    Describes the available localization features from the point of | ||||||
|     <productname>Postgres</productname>. |    view of the administrator. | ||||||
|   </para> |   </para> | ||||||
|  </abstract> |  </abstract> | ||||||
|  |  | ||||||
|   <para> |   <para> | ||||||
|    <productname>Postgres</productname> supports non-ASCII character |    <productname>Postgres</productname> supports localization with | ||||||
|    sets with two approaches:  |    three approaches: | ||||||
|  |  | ||||||
|    <itemizedlist> |    <itemizedlist> | ||||||
|     <listitem> |     <listitem> | ||||||
|      <para> |      <para> | ||||||
|       Using locale features in underlying |       Using the locale features of the operating system to provide | ||||||
|       system libraries. This allows single-byte character sets to be |       locale-specific collation order, number formatting, and other | ||||||
|       configured with a locale-specific collation order, provided that |       aspects. | ||||||
|       the underlying system supports the required locale. This |  | ||||||
|       technique supports only one character set per server, and can |  | ||||||
|       not support multi-byte character sets. |  | ||||||
|      </para> |      </para> | ||||||
|     </listitem> |     </listitem> | ||||||
|  |  | ||||||
|     <listitem> |     <listitem> | ||||||
|      <para> |      <para> | ||||||
|       Using explicit multiple-byte character sets defined in the |       Using explicit multiple-byte character sets defined in the | ||||||
|       <productname>Postgres</productname> server. These character sets |       <productname>Postgres</productname> server to support languages | ||||||
|       are also known to some client libraries. The number of character |       that require more characters than will fit into a single byte, | ||||||
|       sets is fixed at the time the server is compiled, and internal |       and to provide character set recoding between client and server. | ||||||
|       operations such as string comparisons require expansion of each |       The number of supported character sets is fixed at the time the | ||||||
|       character into a 32-bit word. |       server is compiled, and internal operations such as string | ||||||
|  |       comparisons require expansion of each character into a 32-bit | ||||||
|  |       word. | ||||||
|  |      </para> | ||||||
|  |     </listitem> | ||||||
|  |  | ||||||
|  |     <listitem> | ||||||
|  |      <para> | ||||||
|  |       Single byte character recoding provides a more light-weight | ||||||
|  |       solution for users of multiple, yet single-byte character sets. | ||||||
|      </para> |      </para> | ||||||
|     </listitem> |     </listitem> | ||||||
|    </itemizedlist> |    </itemizedlist> | ||||||
|   </para> |   </para> | ||||||
|  |  | ||||||
|  |  | ||||||
|  |  <sect1 id="locale"> | ||||||
|  |   <title>Locale Support</title> | ||||||
|  |    | ||||||
|  |   <para> | ||||||
|  |    <firstterm>Locale</> support refers to an application respecting | ||||||
|  |    cultural preferences regarding alphabets, sorting, number | ||||||
|  |    formatting, etc.  <productname>PostgreSQL</> uses the standard ISO | ||||||
|  |    C and POSIX-like locale facilities provided by the server operating | ||||||
|  |    system.  For additional information refer the documentation of your | ||||||
|  |    system. | ||||||
|  |   </para> | ||||||
|  |  | ||||||
|  |   <sect2> | ||||||
|  |    <title>Overview</> | ||||||
|  |  | ||||||
|  |   <para> | ||||||
|  |     Locale support is not build into <productname>PostgreSQL</> by | ||||||
|  |     default; to enable it, supply the <option>--enable-locale</> option | ||||||
|  |     to the <filename>configure</> script: | ||||||
|  | <informalexample> | ||||||
|  | <screen> | ||||||
|  | <prompt>$ </><userinput>./configure --enable-locale</> | ||||||
|  | </screen> | ||||||
|  | </informalexample> | ||||||
|  |     Locale support only affects the server; all clients are compatible | ||||||
|  |     with servers with or without locale support. | ||||||
|  |    </para> | ||||||
|  |  | ||||||
|  |    <para> | ||||||
|  |     The information about which particular cultural rules to use is | ||||||
|  |     determined by standard environment variables.  If you are getting | ||||||
|  |     localized behavior from other programs you probably have them set | ||||||
|  |     up already.  The simplest way to set the localization information | ||||||
|  |     is the <envar>LANG</> variable, for example: | ||||||
|  | <programlisting> | ||||||
|  | export LANG=sv_SE | ||||||
|  | </programlisting> | ||||||
|  |     This sets the locale to Swedish (<literal>sv</>) as spoken in | ||||||
|  |     Sweden (<literal>SE</>).  Other possibilities might be | ||||||
|  |     <literal>en_US</> (U.S. English) and <literal>fr_CA</> (Canada, | ||||||
|  |     French).  If more than one character set can be useful for a locale | ||||||
|  |     then the specifications look like this: | ||||||
|  |     <literal>cs_CZ.ISO8859-2</>. What locales are available under what | ||||||
|  |     names on your system depends on what was provided by the operating | ||||||
|  |     system vendor and what was installed. | ||||||
|  |    </para> | ||||||
|  |  | ||||||
|  |    <para> | ||||||
|  |     Occasionally it is useful to mix rules from several locales, e.g., | ||||||
|  |     use U.S. rules but Spanish messages.  To do that a set of | ||||||
|  |     environment variables exist that override the default of | ||||||
|  |     <envar>LANG</> for a particular category: | ||||||
|  |  | ||||||
|  |     <informaltable> | ||||||
|  |      <tgroup cols="2"> | ||||||
|  |       <tbody> | ||||||
|  |        <row> | ||||||
|  |         <entry>LC_COLLATE</> | ||||||
|  |         <entry>String sort order</> | ||||||
|  |        </row> | ||||||
|  |        <row> | ||||||
|  |         <entry>LC_CTYPE</> | ||||||
|  |         <entry>Character classification (What is a letter? What is the upper-case equivalent of this letter?)</> | ||||||
|  |        </row> | ||||||
|  |        <row> | ||||||
|  |         <entry>LC_MESSAGES</> | ||||||
|  |         <entry>Language of messages</> | ||||||
|  |        </row> | ||||||
|  |        <row> | ||||||
|  |         <entry>LC_MONETARY</> | ||||||
|  |         <entry>Formatting of currency amounts</> | ||||||
|  |        </row> | ||||||
|  |        <row> | ||||||
|  |         <entry>LC_NUMERIC</> | ||||||
|  |         <entry>Formatting of numbers</> | ||||||
|  |        </row> | ||||||
|  |        <row> | ||||||
|  |         <entry>LC_TIME</> | ||||||
|  |         <entry>Formatting of dates and times</> | ||||||
|  |        </row> | ||||||
|  |       </tbody> | ||||||
|  |      </tgroup> | ||||||
|  |     </informaltable> | ||||||
|  |  | ||||||
|  |     <envar>LC_MESSAGES</> only affects the messages that come from the | ||||||
|  |     operating system, not <productname>PostgreSQL</>. | ||||||
|  |    </para> | ||||||
|  |  | ||||||
|  |    <para> | ||||||
|  |     If you want the system to behave as if it had no locale support, | ||||||
|  |     use the special locale <literal>C</> or <literal>POSIX</>, or | ||||||
|  |     simply unset all locale related variables. | ||||||
|  |    </para> | ||||||
|  |  | ||||||
|  |    <para> | ||||||
|  |     Once you have chosen a set of localization rules this way you must | ||||||
|  |     keep them fixed for any particular database cluster.  That means | ||||||
|  |     that the locales that were active when you ran <filename>initdb</> | ||||||
|  |     must be kept the same when you start the postmaster.  Otherwise, | ||||||
|  |     the changed sort order can corrupt indexes or make your data | ||||||
|  |     disappear mysteriously.  It is currently not possible to change the | ||||||
|  |     locales after database initialization or to use more than one set | ||||||
|  |     of locales for a given database cluster. | ||||||
|  |    </para> | ||||||
|  |   </sect2> | ||||||
|  |  | ||||||
|  |   <sect2> | ||||||
|  |    <title>Benefits</> | ||||||
|  |  | ||||||
|  |    <para> | ||||||
|  |     Locale support influences in particular the following features: | ||||||
|  |  | ||||||
|  |     <itemizedlist> | ||||||
|  |      <listitem> | ||||||
|  |       <para> | ||||||
|  |        Sort order in <command>ORDER BY</> queries. | ||||||
|  |       </para> | ||||||
|  |      </listitem> | ||||||
|  |  | ||||||
|  |      <listitem> | ||||||
|  |       <para> | ||||||
|  |        The <function>to_char</> family of functions | ||||||
|  |       </para> | ||||||
|  |      </listitem> | ||||||
|  |  | ||||||
|  |      <listitem> | ||||||
|  |       <para> | ||||||
|  |        The <literal>LIKE</> and <literal>~</> operators for pattern | ||||||
|  |        matching | ||||||
|  |       </para> | ||||||
|  |      </listitem> | ||||||
|  |     </itemizedlist> | ||||||
|  |    </para> | ||||||
|  |  | ||||||
|  |    <para> | ||||||
|  |     The only severe drawback of using the locale support in | ||||||
|  |     <productname>PostgreSQL</> is its speed.  So use locale only if you | ||||||
|  |     actually need it. | ||||||
|  |    </para> | ||||||
|  |   </sect2> | ||||||
|  |  | ||||||
|  |   <sect2> | ||||||
|  |    <title>Problems</> | ||||||
|  |  | ||||||
|  |    <para> | ||||||
|  |     If locale support doesn't work in spite of the explanation above, | ||||||
|  |     check that the locale support in your operating system is okay. | ||||||
|  |     To check whether a given locale is installed and functional you | ||||||
|  |     can use <application>Perl</>, for example.  Perl has also support | ||||||
|  |     for locales and if a locale is broken <command>perl -v</> will | ||||||
|  |     complain something like this: | ||||||
|  | <screen> | ||||||
|  | <prompt>$</> <userinput>export LC_CTYPE='not_exist'</> | ||||||
|  | <prompt>$</> <userinput>perl -v</> | ||||||
|  | <computeroutput> | ||||||
|  | perl: warning: Setting locale failed. | ||||||
|  | perl: warning: Please check that your locale settings: | ||||||
|  | LC_ALL = (unset), | ||||||
|  | LC_CTYPE = "not_exist", | ||||||
|  | LANG = (unset) | ||||||
|  | are supported and installed on your system. | ||||||
|  | perl: warning: Falling back to the standard locale ("C"). | ||||||
|  | </computeroutput> | ||||||
|  | </screen> | ||||||
|  |    </para> | ||||||
|  |  | ||||||
|  |    <para> | ||||||
|  |     Check that your locale files are in the right location.  Possible | ||||||
|  |     locations include: <filename>/usr/lib/locale</filename> (Linux, | ||||||
|  |     Solaris), <filename>/usr/share/locale</filename> (Linux), | ||||||
|  |     <filename>/usr/lib/nls/loc</filename> (DUX 4.0).  Check the locale | ||||||
|  |     man page of your system if you are not sure. | ||||||
|  |    </para> | ||||||
|  |  | ||||||
|  |    <para> | ||||||
|  |     The directory <filename>src/test/locale</> contains a test suite | ||||||
|  |     for <productname>PostgreSQL</>'s locale support. | ||||||
|  |    </para> | ||||||
|  |   </sect2> | ||||||
|  |  </sect1> | ||||||
|  |  | ||||||
|  |  | ||||||
|   <sect1 id="multibyte"> |   <sect1 id="multibyte"> | ||||||
|    <title>Multi-byte Support</title> |    <title>Multibyte Support</title> | ||||||
|  |  | ||||||
|    <note> |    <note> | ||||||
|     <title>Author</title> |     <title>Author</title> | ||||||
| @@ -53,7 +244,7 @@ | |||||||
|    </note> |    </note> | ||||||
|  |  | ||||||
|    <para> |    <para> | ||||||
|     Multi-byte (<acronym>MB</acronym>) support is intended to allow |     Multibyte (<acronym>MB</acronym>) support is intended to allow | ||||||
|     <productname>Postgres</productname> to handle |     <productname>Postgres</productname> to handle | ||||||
|     multiple-byte character sets such as EUC (Extended Unix Code), Unicode and |     multiple-byte character sets such as EUC (Extended Unix Code), Unicode and | ||||||
|     Mule internal code. With <acronym>MB</acronym> enabled you can use multi-byte |     Mule internal code. With <acronym>MB</acronym> enabled you can use multi-byte | ||||||
| @@ -680,7 +871,78 @@ SET CLIENT_ENCODING = 'WIN1250'; | |||||||
|     </procedure> |     </procedure> | ||||||
|    </sect2> |    </sect2> | ||||||
|   </sect1> |   </sect1> | ||||||
|  </chapter> |  | ||||||
|  |  | ||||||
|  |  <sect1 id="recode"> | ||||||
|  |   <title>Single-byte character set recoding</> | ||||||
|  | <!-- formerly in README.charsets, by Josef Balatka, <balatka@email.cz> --> | ||||||
|  |  | ||||||
|  |   <para> | ||||||
|  |    You can set up this feature with the <option>--enable-recode</> option | ||||||
|  |    to <filename>configure</>. This option was formerly described as | ||||||
|  |    <quote>Cyrillic recode support</> which doesn't express all its | ||||||
|  |    power. It can be used for <emphasis>any</> single-byte character | ||||||
|  |    set recoding. | ||||||
|  |   </para> | ||||||
|  |  | ||||||
|  |   <para> | ||||||
|  |    This method uses a file <filename>charset.conf</> file located in | ||||||
|  |    the database directory (<envar>PGDATA</>).  It's a typical | ||||||
|  |    configuration text file where spaces and newlines separate items | ||||||
|  |    and records and # specifies comments.  Three keywords with the | ||||||
|  |    following syntax are recognized here: | ||||||
|  | <synopsis> | ||||||
|  | BaseCharset      <replaceable>server_charset</> | ||||||
|  | RecodeTable      <replaceable>from_charset</> <replaceable>to_charset</> <replaceable>file_name</> | ||||||
|  | HostCharset      <replaceable>host_spec</>    <replaceable>host_charset</> | ||||||
|  | </synopsis> | ||||||
|  |   </para> | ||||||
|  |  | ||||||
|  |   <para> | ||||||
|  |    <token>BaseCharset</> defines the encoding of the database server. | ||||||
|  |    All character set names are only used for mapping inside of | ||||||
|  |    <filename>charset.conf</> so you can freely use typing-friendly | ||||||
|  |    names. | ||||||
|  |   </para> | ||||||
|  |  | ||||||
|  |   <para> | ||||||
|  |    <token>RecodeTable</> records specify translation tables between | ||||||
|  |    server and client.  The file name is relative to the | ||||||
|  |    <envar>PGDATA</> directory.  The table file format is very | ||||||
|  |    simple. There are no keywords and characters are represented by a | ||||||
|  |    pair of decimal or hexadecimal (0x prefixed) values on single | ||||||
|  |    lines: | ||||||
|  | <synopsis> | ||||||
|  | <replaceable>char_value</>   <replaceable>translated_char_value</> | ||||||
|  | </synopsis> | ||||||
|  |   </para> | ||||||
|  |  | ||||||
|  |   <para> | ||||||
|  |    <token>HostCharset</> records define the client character set by IP | ||||||
|  |    address. You can use a single IP address, an IP mask range starting | ||||||
|  |    from the given address or an IP interval (e.g., 127.0.0.1, | ||||||
|  |    192.168.1.100/24, 192.168.1.20-192.168.1.40). | ||||||
|  |   </para> | ||||||
|  |  | ||||||
|  |   <para> | ||||||
|  |    The <filename>charset.conf</> file is always processed up to the | ||||||
|  |    end, so you can easily specify exceptions from the previous | ||||||
|  |    rules. In the src/data you will find charset.conf example and a few | ||||||
|  |    recoding tables. | ||||||
|  |   </para> | ||||||
|  |  | ||||||
|  |   <para> | ||||||
|  |    As this solution is based on the client's IP address and character | ||||||
|  |    set mapping there are obviously some restrictions as well. You | ||||||
|  |    cannot use different encodings on the same host at the same | ||||||
|  |    time. It is also inconvenient when you boot your client hosts into | ||||||
|  |    more operating systems.  Nevertheless, when these restrictions are | ||||||
|  |    not limiting and you do not need multi-byte characters than it is a | ||||||
|  |    simple and effective solution. | ||||||
|  |   </para> | ||||||
|  |  </sect1> | ||||||
|  |  | ||||||
|  | </chapter> | ||||||
|  |  | ||||||
| <!-- Keep this comment at the end of the file | <!-- Keep this comment at the end of the file | ||||||
| Local variables: | Local variables: | ||||||
|   | |||||||
| @@ -1,4 +1,4 @@ | |||||||
| <!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.21 2000/09/29 20:21:34 petere Exp $ --> | <!-- $Header: /cvsroot/pgsql/doc/src/sgml/installation.sgml,v 1.22 2000/09/30 16:58:20 petere Exp $ --> | ||||||
|  |  | ||||||
| <chapter id="installation"> | <chapter id="installation"> | ||||||
|  <title><![%flattext-install-include[<productname>PostgreSQL</> ]]>Installation Instructions</title> |  <title><![%flattext-install-include[<productname>PostgreSQL</> ]]>Installation Instructions</title> | ||||||
| @@ -447,8 +447,9 @@ su - postgres | |||||||
|        <term>--enable-recode</term> |        <term>--enable-recode</term> | ||||||
|        <listitem> |        <listitem> | ||||||
|         <para> |         <para> | ||||||
|          Enables character set recode support. See |          Enables single-byte character set recode support. See | ||||||
|          <filename>doc/README.Charsets</> for details on this feature. |          <![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]> | ||||||
|  |          <![%flattext-install-ignore[<xref linkend="recode">]]> about this feature. | ||||||
|         </para> |         </para> | ||||||
|        </listitem> |        </listitem> | ||||||
|       </varlistentry> |       </varlistentry> | ||||||
| @@ -459,7 +460,10 @@ su - postgres | |||||||
|         <para> |         <para> | ||||||
|          Allows the use of multibyte character encodings. This is |          Allows the use of multibyte character encodings. This is | ||||||
|          primarily for languages like Japanese, Korean, and Chinese. |          primarily for languages like Japanese, Korean, and Chinese. | ||||||
|          Read <filename>doc/README.mb</> for details. |          Read  | ||||||
|  |          <![%flattext-install-include[the <citetitle>Administrator's Guide</citetitle>]]> | ||||||
|  |          <![%flattext-install-ignore[<xref linkend="multibyte">]]> | ||||||
|  |          for details. | ||||||
|         </para> |         </para> | ||||||
|        </listitem> |        </listitem> | ||||||
|       </varlistentry> |       </varlistentry> | ||||||
|   | |||||||
| @@ -1,5 +1,5 @@ | |||||||
| <!-- | <!-- | ||||||
| $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 thomas Exp $ | $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.42 2000/09/30 16:58:20 petere Exp $ | ||||||
| --> | --> | ||||||
|  |  | ||||||
| <!doctype set PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ | <!doctype set PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ | ||||||
| @@ -173,9 +173,9 @@ $Header: /cvsroot/pgsql/doc/src/sgml/postgres.sgml,v 1.41 2000/09/12 05:37:09 th | |||||||
| --> | --> | ||||||
|   &installation; |   &installation; | ||||||
|   &installw; |   &installw; | ||||||
|   &charset; |  | ||||||
|   &runtime; |   &runtime; | ||||||
|   &client-auth; |   &client-auth; | ||||||
|  |   &charset; | ||||||
|   &manage-ag; |   &manage-ag; | ||||||
|   &user-manag; |   &user-manag; | ||||||
|   &backup; |   &backup; | ||||||
|   | |||||||
| @@ -1,5 +1,5 @@ | |||||||
| <!-- | <!-- | ||||||
| $Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.25 2000/09/29 20:21:34 petere Exp $ | $Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.26 2000/09/30 16:58:20 petere Exp $ | ||||||
| --> | --> | ||||||
|  |  | ||||||
| <Chapter Id="runtime"> | <Chapter Id="runtime"> | ||||||
| @@ -1553,126 +1553,6 @@ set semsys:seminfo_semmsl=32 | |||||||
|  </sect1> |  </sect1> | ||||||
|  |  | ||||||
|  |  | ||||||
|  <sect1 id="locale"> |  | ||||||
|   <title>Locale Support</title> |  | ||||||
|    |  | ||||||
|   <note> |  | ||||||
|    <title>Acknowledgement</title> |  | ||||||
|    <para> |  | ||||||
|     Written by Oleg Bartunov. See <ulink |  | ||||||
|     url="http://www.sai.msu.su/~megera/postgres/">Oleg's web |  | ||||||
|     page</ulink> for additional information on locale and Russian |  | ||||||
|     language support. |  | ||||||
|    </para> |  | ||||||
|   </note> |  | ||||||
|  |  | ||||||
|   <para> |  | ||||||
|    While doing a project for a company in Moscow, Russia, I |  | ||||||
|    encountered the problem that <productname>Postgres</> had no |  | ||||||
|    support of national alphabets. After looking for possible |  | ||||||
|    workarounds I decided to develop support of locale myself. I'm not |  | ||||||
|    a C programmer but already had some experience with locale |  | ||||||
|    programming when I work with <productname>Perl</> (debugging) and |  | ||||||
|    <productname>Glimpse</>. After several days of digging through the |  | ||||||
|    <productname>Postgres</> source tree I made very minor corections |  | ||||||
|    to <filename>src/backend/utils/adt/varlena.c</> and |  | ||||||
|    <filename>src/backend/main/main.c</> and got what I needed! I did |  | ||||||
|    support only for <envar>LC_CTYPE</envar> and |  | ||||||
|    <envar>LC_COLLATE</envar>, but later <envar>LC_MONETARY</envar> was |  | ||||||
|    added by others. I got many messages from people about this patch |  | ||||||
|    so I decided to send it to developers and (to my surprise) it was |  | ||||||
|    incorporated into the <productname>Postgres</> distribution. |  | ||||||
|   </para> |  | ||||||
|  |  | ||||||
|   <para> |  | ||||||
|    People often complain that locale doesn't work for them. There are |  | ||||||
|    several common mistakes: |  | ||||||
|     |  | ||||||
|    <itemizedlist> |  | ||||||
|     <listitem> |  | ||||||
|      <para> |  | ||||||
|       Didn't properly configure <productname>Postgres</> before |  | ||||||
|       compilation. You must run <filename>configure</> with the |  | ||||||
|       <option>--enable-locale</> option to enable locale support. |  | ||||||
|      </para> |  | ||||||
|     </listitem> |  | ||||||
|  |  | ||||||
|     <listitem> |  | ||||||
|      <para> |  | ||||||
|       Didn't setup environment correctly when starting postmaster. You |  | ||||||
|       must define environment variables <envar>LC_CTYPE</envar> and |  | ||||||
|       <envar>LC_COLLATE</envar> before running postmaster because |  | ||||||
|       backend gets information about locale from environment. I use |  | ||||||
|       following shell script: |  | ||||||
| <programlisting> |  | ||||||
| #!/bin/sh |  | ||||||
|  |  | ||||||
| export LC_CTYPE=koi8-r |  | ||||||
| export LC_COLLATE=koi8-r |  | ||||||
| postmaster -B 1024 -S -D/usr/local/pgsql/data/ -o '-Fe' |  | ||||||
| </programlisting> |  | ||||||
|      </para> |  | ||||||
|     </listitem> |  | ||||||
|  |  | ||||||
|     <listitem> |  | ||||||
|      <para> |  | ||||||
|       Broken locale support in the operating system (for example, |  | ||||||
|       locale support in libc under Linux several times has changed and |  | ||||||
|       this caused a lot of problems). Perl has also support of locale |  | ||||||
|       and if locale is broken <command>perl -v</> will complain |  | ||||||
|       something like: |  | ||||||
| <screen> |  | ||||||
| <prompt>$</> <userinput>export LC_CTYPE='not_exist'</> |  | ||||||
| <prompt>$</> <userinput>perl -v</> |  | ||||||
| <computeroutput> |  | ||||||
| perl: warning: Setting locale failed. |  | ||||||
| perl: warning: Please check that your locale settings: |  | ||||||
| LC_ALL = (unset), |  | ||||||
| LC_CTYPE = "not_exist", |  | ||||||
| LANG = (unset) |  | ||||||
| are supported and installed on your system. |  | ||||||
| perl: warning: Falling back to the standard locale ("C"). |  | ||||||
| </computeroutput> |  | ||||||
| </screen> |  | ||||||
|      </para> |  | ||||||
|     </listitem> |  | ||||||
|  |  | ||||||
|     <listitem> |  | ||||||
|      <para> |  | ||||||
|       Wrong location of locale files. Possible locations include: |  | ||||||
|       <filename>/usr/lib/locale</filename> (Linux, Solaris), |  | ||||||
|       <filename>/usr/share/locale</filename> (Linux), |  | ||||||
|       <filename>/usr/lib/nls/loc</filename> (DUX 4.0). |  | ||||||
|        |  | ||||||
|       Check <command>man locale</command> to find the correct |  | ||||||
|       location. Under Linux I made a symbolic link between |  | ||||||
|       <filename>/usr/lib/locale</filename> and |  | ||||||
|       <filename>/usr/share/locale</filename> to be sure that the next |  | ||||||
|       libc will not break my locale. |  | ||||||
|      </para> |  | ||||||
|     </listitem> |  | ||||||
|    </itemizedlist> |  | ||||||
|   </para> |  | ||||||
|  |  | ||||||
|   <formalpara> |  | ||||||
|    <title>What are the Benefits?</title>  |  | ||||||
|    <para> |  | ||||||
|     You can use ~* and order by operators for strings contain |  | ||||||
|     characters from national alphabets. Non-english users definitely |  | ||||||
|     need that. |  | ||||||
|    </para> |  | ||||||
|   </formalpara> |  | ||||||
|  |  | ||||||
|   <formalpara> |  | ||||||
|    <title>What are the Drawbacks?</title> |  | ||||||
|    <para> |  | ||||||
|     There is one evident drawback of using locale - its speed! So, use |  | ||||||
|     locale only if you really need it. |  | ||||||
|    </para> |  | ||||||
|   </formalpara> |  | ||||||
|  </sect1> |  | ||||||
|  |  | ||||||
|  |  | ||||||
|  <sect1 id="postmaster-shutdown"> |  <sect1 id="postmaster-shutdown"> | ||||||
|   <title>Shutting down the server</title> |   <title>Shutting down the server</title> | ||||||
|  |  | ||||||
|   | |||||||
		Reference in New Issue
	
	Block a user