1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-28 23:42:10 +03:00

Introduce "builtin" collation provider.

New provider for collations, like "libc" or "icu", but without any
external dependency.

Initially, the only locale supported by the builtin provider is "C",
which is identical to the libc provider's "C" locale. The libc
provider's "C" locale has always been treated as a special case that
uses an internal implementation, without using libc at all -- so the
new builtin provider uses the same implementation.

The builtin provider's locale is independent of the server environment
variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the
database collation locale can be "C" while LC_COLLATE and LC_CTYPE are
set to "en_US", which is impossible with the libc provider.

By offering a new builtin provider, it clarifies that the semantics of
a collation using this provider will never depend on libc, and makes
it easier to document the behavior.

Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com
Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org
Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider
This commit is contained in:
Jeff Davis
2024-03-13 23:33:44 -07:00
parent 6ab2e8385d
commit 2d819a08a1
25 changed files with 671 additions and 158 deletions

View File

@ -342,22 +342,14 @@ initdb --locale=sv_SE
<title>Locale Providers</title>
<para>
<productname>PostgreSQL</productname> supports multiple <firstterm>locale
providers</firstterm>. This specifies which library supplies the locale
data. One standard provider name is <literal>libc</literal>, which uses
the locales provided by the operating system C library. These are the
locales used by most tools provided by the operating system. Another
provider is <literal>icu</literal>, which uses the external
ICU<indexterm><primary>ICU</primary></indexterm> library. ICU locales can
only be used if support for ICU was configured when PostgreSQL was built.
A locale provider specifies which library defines the locale behavior for
collations and character classifications.
</para>
<para>
The commands and tools that select the locale settings, as described
above, each have an option to select the locale provider. The examples
shown earlier all use the <literal>libc</literal> provider, which is the
default. Here is an example to initialize a database cluster using the
ICU provider:
above, each have an option to select the locale provider. Here is an
example to initialize a database cluster using the ICU provider:
<programlisting>
initdb --locale-provider=icu --icu-locale=en
</programlisting>
@ -370,12 +362,76 @@ initdb --locale-provider=icu --icu-locale=en
</para>
<para>
Which locale provider to use depends on individual requirements. For most
basic uses, either provider will give adequate results. For the libc
provider, it depends on what the operating system offers; some operating
systems are better than others. For advanced uses, ICU offers more locale
variants and customization options.
Regardless of the locale provider, the operating system is still used to
provide some locale-aware behavior, such as messages (see <xref
linkend="guc-lc-messages"/>).
</para>
<para>
The available locale providers are listed below:
</para>
<variablelist>
<varlistentry>
<term><literal>builtin</literal></term>
<listitem>
<para>
The <literal>builtin</literal> provider uses built-in operations. Only
the <literal>C</literal> locale is supported for this provider.
</para>
<para>
The <literal>C</literal> locale behavior is identical to the
<literal>C</literal> locale in the libc provider. When using this
locale, the behavior may depend on the database encoding.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>icu</literal></term>
<listitem>
<para>
The <literal>icu</literal> provider uses the external
ICU<indexterm><primary>ICU</primary></indexterm>
library. <productname>PostgreSQL</productname> must have been
configured with support.
</para>
<para>
ICU provides collation and character classification behavior that is
independent of the operating system and database encoding, which is
preferable if you expect to transition to other platforms without any
change in results. <literal>LC_COLLATE</literal> and
<literal>LC_CTYPE</literal> can be set independently of the ICU
locale.
</para>
<note>
<para>
For the ICU provider, results may depend on the version of the ICU
library used, as it is updated to reflect changes in natural language
over time.
</para>
</note>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>libc</literal></term>
<listitem>
<para>
The <literal>libc</literal> provider uses the operating system's C
library. The collation and character classification behavior is
controlled by the settings <literal>LC_COLLATE</literal> and
<literal>LC_CTYPE</literal>, so they cannot be set independently.
</para>
<note>
<para>
The same locale name may have different behavior on different
platforms when using the libc provider.
</para>
</note>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2 id="icu-locales">

View File

@ -96,6 +96,11 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
<replaceable>locale</replaceable>, you cannot specify either of those
parameters.
</para>
<para>
If <replaceable>provider</replaceable> is <literal>builtin</literal>,
then <replaceable>locale</replaceable> must be specified and set to
<literal>C</literal>.
</para>
</listitem>
</varlistentry>
@ -129,9 +134,9 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
<listitem>
<para>
Specifies the provider to use for locale services associated with this
collation. Possible values are
<literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
(if the server was built with ICU support) or <literal>libc</literal>.
collation. Possible values are <literal>builtin</literal>,
<literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if
the server was built with ICU support) or <literal>libc</literal>.
<literal>libc</literal> is the default. See <xref
linkend="locale-providers"/> for details.
</para>

View File

@ -162,6 +162,11 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
linkend="create-database-lc-ctype"/>, or <xref
linkend="create-database-icu-locale"/> individually.
</para>
<para>
If <xref linkend="create-database-locale-provider"/> is
<literal>builtin</literal>, then <replaceable>locale</replaceable>
must be specified and set to <literal>C</literal>.
</para>
<tip>
<para>
The other locale settings <xref linkend="guc-lc-messages"/>, <xref
@ -243,7 +248,7 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
<listitem>
<para>
Specifies the provider to use for the default collation in this
database. Possible values are
database. Possible values are <literal>builtin</literal>,
<literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
(if the server was built with ICU support) or <literal>libc</literal>.
By default, the provider is the same as that of the <xref

View File

@ -171,7 +171,7 @@ PostgreSQL documentation
</varlistentry>
<varlistentry>
<term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
<term><option>--locale-provider={<literal>builtin</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
<listitem>
<para>
Specifies the locale provider for the database's default collation.

View File

@ -286,6 +286,11 @@ PostgreSQL documentation
environment that <command>initdb</command> runs in. Locale
support is described in <xref linkend="locale"/>.
</para>
<para>
If <option>--locale-provider</option> is <literal>builtin</literal>,
<option>--locale</option> must be specified and set to
<literal>C</literal>.
</para>
</listitem>
</varlistentry>
@ -314,8 +319,18 @@ PostgreSQL documentation
</listitem>
</varlistentry>
<varlistentry id="app-initdb-builtin-locale">
<term><option>--builtin-locale=<replaceable>locale</replaceable></option></term>
<listitem>
<para>
Specifies the locale name when the builtin provider is used. Locale support
is described in <xref linkend="locale"/>.
</para>
</listitem>
</varlistentry>
<varlistentry id="app-initdb-option-locale-provider">
<term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
<term><option>--locale-provider={<literal>builtin</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
<listitem>
<para>
This option sets the locale provider for databases created in the new