mirror of
https://github.com/postgres/postgres.git
synced 2025-07-27 12:41:57 +03:00
I really hope that I haven't missed anything in this one...
From: t-ishii@sra.co.jp Attached are patches to enhance the multi-byte support. (patches are against 7/18 snapshot) * determine encoding at initdb/createdb rather than compile time Now initdb/createdb has an option to specify the encoding. Also, I modified the syntax of CREATE DATABASE to accept encoding option. See README.mb for more details. For this purpose I have added new column "encoding" to pg_database. Also pg_attribute and pg_class are changed to catch up the modification to pg_database. Actually I haved added pg_database_mb.h, pg_attribute_mb.h and pg_class_mb.h. These are used only when MB is enabled. The reason having separate files is I couldn't find a way to use ifdef or whatever in those files. I have to admit it looks ugly. No way. * support for PGCLIENTENCODING when issuing COPY command commands/copy.c modified. * support for SQL92 syntax "SET NAMES" See gram.y. * support for LATIN2-5 * add UNICODE regression test case * new test suite for MB New directory test/mb added. * clean up source files Basic idea is to have MB's own subdirectory for easier maintenance. These are include/mb and backend/utils/mb.
This commit is contained in:
@ -1,4 +1,4 @@
|
||||
postgresql 6.4 multi-byte (MB) support README Jun 5 1998
|
||||
postgresql 6.4 multi-byte (MB) support README Jul 22 1998
|
||||
|
||||
Tatsuo Ishii
|
||||
t-ishii@sra.co.jp
|
||||
@ -10,7 +10,10 @@ The MB support is intended for allowing PostgreSQL to handle
|
||||
multi-byte character sets such as EUC(Extended Unix Code), Unicode and
|
||||
Mule internal code. With the MB enabled you can use multi-byte
|
||||
character sets in regexp ,LIKE and some functions. The encoding system
|
||||
chosen is determined at the compile time.
|
||||
chosen is determined when initializing your PostgreSQL installation
|
||||
using initdb(1). Note that this can be overrided when creating a
|
||||
database using createdb(1) or create database SQL command. So you
|
||||
could have multiple databases with different encoding system.
|
||||
|
||||
MB also fixes some problems concerning with 8-bit single byte
|
||||
character sets including ISO8859. (I would not say all of problems
|
||||
@ -36,7 +39,11 @@ where encoding_system is one of:
|
||||
EUC_TW Taiwan EUC
|
||||
UNICODE Unicode(UTF-8)
|
||||
MULE_INTERNAL Mule internal
|
||||
LATIN1 ISO 8859-1 English and some European laguages
|
||||
LATIN1 ISO 8859-1 English and some European languages
|
||||
LATIN2 ISO 8859-2 English and some European languages
|
||||
LATIN3 ISO 8859-3 English and some European languages
|
||||
LATIN4 ISO 8859-4 English and some European languages
|
||||
LATIN5 ISO 8859-5 English and some European languages
|
||||
|
||||
Example:
|
||||
|
||||
@ -50,7 +57,28 @@ Example:
|
||||
If MB is disabled, nothing is changed except better supporting for
|
||||
8-bit single byte character sets.
|
||||
|
||||
2. PGCLIENTENCODING
|
||||
2. How to set encoding
|
||||
|
||||
initdb command defines the default encoding for a PostgreSQL
|
||||
installation. For example:
|
||||
|
||||
% initdb -e EUC_JP
|
||||
|
||||
sets the default encoding to EUC_JP(Extended Unix Code for Japanese).
|
||||
Note that you can use "-pgencoding" instead of "-e" if you like longer
|
||||
option string:-) If no -e or -pgencoding option is given, the encoding
|
||||
specified at the compile time is used.
|
||||
|
||||
You can create a database with a different encoding.
|
||||
|
||||
% createdb -E EUC_KR korean
|
||||
|
||||
will create a database named "korean" with EUC_KR encoding. The
|
||||
another way to accomplish this is to use a SQL command:
|
||||
|
||||
CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
|
||||
|
||||
3. PGCLIENTENCODING
|
||||
|
||||
If an environment variable PGCLIENTENCODING is defined on the
|
||||
frontend, automatic encoding translation is done by the backend. For
|
||||
@ -68,7 +96,11 @@ Supported encodings for PGCLIENTENCODING are:
|
||||
EUC_KR Korean EUC
|
||||
EUC_TW Taiwan EUC
|
||||
MULE_INTERNAL Mule internal
|
||||
LATIN1 ISO 8859-1 English and some European laguages
|
||||
LATIN1 ISO 8859-1 English and some European languages
|
||||
LATIN2 ISO 8859-2 English and some European languages
|
||||
LATIN3 ISO 8859-3 English and some European languages
|
||||
LATIN4 ISO 8859-4 English and some European languages
|
||||
LATIN5 ISO 8859-5 English and some European languages
|
||||
|
||||
Note that UNICODE is not supported(yet). Also note that the
|
||||
translation is not always possible. Suppose you choose EUC_JP for the
|
||||
@ -86,7 +118,12 @@ new command:
|
||||
SET CLIENT_ENCODING TO 'encoding';
|
||||
|
||||
where encoding is one of the encodings those can be set to
|
||||
PGCLIENTENCODING. To query the current the frontend encoding:
|
||||
PGCLIENTENCODING. Also you can use SQL92 syntax "SET NAMES" for this
|
||||
purpose:
|
||||
|
||||
SET NAMES 'encoding';
|
||||
|
||||
To query the current the frontend encoding:
|
||||
|
||||
SHOW CLIENT_ENCODING;
|
||||
|
||||
@ -114,7 +151,16 @@ Unicode: http://www.unicode.org/
|
||||
|
||||
5. History
|
||||
|
||||
Jun 5, 1988
|
||||
Jul 22, 1998
|
||||
* determine encoding at initdb/createdb rather than compile time
|
||||
* support for PGCLIENTENCODING when issuing COPY command
|
||||
* support for SQL92 syntax "SET NAMES"
|
||||
* support for LATIN2-5
|
||||
* add UNICODE regression test case
|
||||
* new test suite for MB
|
||||
* clean up source files
|
||||
|
||||
Jun 5, 1998
|
||||
* add support for the encoding translation between the backend
|
||||
and the frontend
|
||||
* new command SET CLIENT_ENCODING etc. added
|
||||
|
Reference in New Issue
Block a user