1
0
mirror of https://github.com/postgres/postgres.git synced 2025-12-22 17:42:17 +03:00

Add option to use ICU as global locale provider

This adds the option to use ICU as the default locale provider for
either the whole cluster or a database.  New options for initdb,
createdb, and CREATE DATABASE are used to select this.

Since some (legacy) code still uses the libc locale facilities
directly, we still need to set the libc global locale settings even if
ICU is otherwise selected.  So pg_database now has three
locale-related fields: the existing datcollate and datctype, which are
always set, and a new daticulocale, which is only set if ICU is
selected.  A similar change is made in pg_collation for consistency,
but in that case, only the libc-related fields or the ICU-related
field is set, never both.

Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/5e756dd6-0e91-d778-96fd-b1bcb06c161a%402ndquadrant.com
This commit is contained in:
Peter Eisentraut
2022-03-17 11:11:21 +01:00
parent f6f0db4d62
commit f2553d4306
35 changed files with 946 additions and 166 deletions

View File

@@ -28,6 +28,8 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
[ LOCALE [=] <replaceable class="parameter">locale</replaceable> ]
[ LC_COLLATE [=] <replaceable class="parameter">lc_collate</replaceable> ]
[ LC_CTYPE [=] <replaceable class="parameter">lc_ctype</replaceable> ]
[ ICU_LOCALE [=] <replaceable class="parameter">icu_locale</replaceable> ]
[ LOCALE_PROVIDER [=] <replaceable class="parameter">locale_provider</replaceable> ]
[ COLLATION_VERSION = <replaceable>collation_version</replaceable> ]
[ TABLESPACE [=] <replaceable class="parameter">tablespace_name</replaceable> ]
[ ALLOW_CONNECTIONS [=] <replaceable class="parameter">allowconn</replaceable> ]
@@ -160,6 +162,29 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable class="parameter">icu_locale</replaceable></term>
<listitem>
<para>
Specifies the ICU locale ID if the ICU locale provider is used.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable>locale_provider</replaceable></term>
<listitem>
<para>
Specifies the provider to use for the default collation in this
database. Possible values are:
<literal>icu</literal>,<indexterm><primary>ICU</primary></indexterm>
<literal>libc</literal>. <literal>libc</literal> is the default. The
available choices depend on the operating system and build options.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><replaceable>collation_version</replaceable></term>
@@ -314,6 +339,13 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
indexes that would be affected.
</para>
<para>
There is currently no option to use a database locale with nondeterministic
comparisons (see <link linkend="sql-createcollation"><command>CREATE
COLLATION</command></link> for an explanation). If this is needed, then
per-column collations would need to be used.
</para>
<para>
The <literal>CONNECTION LIMIT</literal> option is only enforced approximately;
if two new sessions start at about the same time when just one

View File

@@ -147,6 +147,25 @@ PostgreSQL documentation
</listitem>
</varlistentry>
<varlistentry>
<term><option>--icu-locale=<replaceable class="parameter">locale</replaceable></option></term>
<listitem>
<para>
Specifies the ICU locale ID to be used in this database, if the
ICU locale provider is selected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
<listitem>
<para>
Specifies the locale provider for the database's default collation.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-O <replaceable class="parameter">owner</replaceable></option></term>
<term><option>--owner=<replaceable class="parameter">owner</replaceable></option></term>

View File

@@ -86,30 +86,45 @@ PostgreSQL documentation
</para>
<para>
<command>initdb</command> initializes the database cluster's default
locale and character set encoding. The character set encoding,
collation order (<literal>LC_COLLATE</literal>) and character set classes
(<literal>LC_CTYPE</literal>, e.g., upper, lower, digit) can be set separately
for a database when it is created. <command>initdb</command> determines
those settings for the template databases, which will
serve as the default for all other databases.
<command>initdb</command> initializes the database cluster's default locale
and character set encoding. These can also be set separately for each
database when it is created. <command>initdb</command> determines those
settings for the template databases, which will serve as the default for
all other databases. By default, <command>initdb</command> uses the
locale provider <literal>libc</literal>, takes the locale settings from
the environment, and determines the encoding from the locale settings.
This is almost always sufficient, unless there are special requirements.
</para>
<para>
To alter the default collation order or character set classes, use the
<option>--lc-collate</option> and <option>--lc-ctype</option> options.
Collation orders other than <literal>C</literal> or <literal>POSIX</literal> also have
a performance penalty. For these reasons it is important to choose the
right locale when running <command>initdb</command>.
To choose a different locale for the cluster, use the option
<option>--locale</option>. There are also individual options
<option>--lc-*</option> (see below) to set values for the individual locale
categories. Note that inconsistent settings for different locale
categories can give nonsensical results, so this should be used with care.
</para>
<para>
The remaining locale categories can be changed later when the server
is started. You can also use <option>--locale</option> to set the
default for all locale categories, including collation order and
character set classes. All server locale values (<literal>lc_*</literal>) can
be displayed via <command>SHOW ALL</command>.
More details can be found in <xref linkend="locale"/>.
Alternatively, the ICU library can be used to provide locale services.
(Again, this only sets the default for subsequently created databases.) To
select this option, specify <literal>--locale-provider=icu</literal>.
To chose the specific ICU locale ID to apply, use the option
<option>--icu-locale</option>. Note that
for implementation reasons and to support legacy code,
<command>initdb</command> will still select and initialize libc locale
settings when the ICU locale provider is used.
</para>
<para>
When <command>initdb</command> runs, it will print out the locale settings
it has chosen. If you have complex requirements or specified multiple
options, it is advisable to check that the result matches what was
intended.
</para>
<para>
More details about locale settings can be found in <xref
linkend="locale"/>.
</para>
<para>
@@ -210,6 +225,15 @@ PostgreSQL documentation
</listitem>
</varlistentry>
<varlistentry>
<term><option>--icu-locale=<replaceable>locale</replaceable></option></term>
<listitem>
<para>
Specifies the ICU locale ID, if the ICU locale provider is used.
</para>
</listitem>
</varlistentry>
<varlistentry id="app-initdb-data-checksums" xreflabel="data checksums">
<term><option>-k</option></term>
<term><option>--data-checksums</option></term>
@@ -264,6 +288,18 @@ PostgreSQL documentation
</listitem>
</varlistentry>
<varlistentry>
<term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
<listitem>
<para>
This option sets the locale provider for databases created in the
new cluster. It can be overridden in the <command>CREATE
DATABASE</command> command when new databases are subsequently
created. The default is <literal>libc</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-N</option></term>
<term><option>--no-sync</option></term>