1
0
mirror of https://github.com/postgres/postgres.git synced 2025-08-28 18:48:04 +03:00

Support PG_UNICODE_FAST locale in the builtin collation provider.

The PG_UNICODE_FAST locale uses code point sort order (fast,
memcmp-based) combined with Unicode character semantics. The character
semantics are based on Unicode full case mapping.

Full case mapping can map a single codepoint to multiple codepoints,
such as "ß" uppercasing to "SS". Additionally, it handles
context-sensitive mappings like the "final sigma", and it uses
titlecase mappings such as "Dž" when titlecasing (rather than plain
uppercase mappings).

Importantly, the uppercasing of "ß" as "SS" is specifically mentioned
by the SQL standard. In Postgres, UCS_BASIC uses plain ASCII semantics
for case mapping and pattern matching, so if we changed it to use the
PG_UNICODE_FAST locale, it would offer better compliance with the
standard. For now, though, do not change the behavior of UCS_BASIC.

Discussion: https://postgr.es/m/ddfd67928818f138f51635712529bc5e1d25e4e7.camel@j-davis.com
Discussion: https://postgr.es/m/27bb0e52-801d-4f73-a0a4-02cfdd4a9ada@eisentraut.org
Reviewed-by: Peter Eisentraut, Daniel Verite
This commit is contained in:
Jeff Davis
2025-01-17 15:56:30 -08:00
parent 286a365b9c
commit d3d0983169
13 changed files with 283 additions and 16 deletions

View File

@@ -99,7 +99,8 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
<para>
If <replaceable>provider</replaceable> is <literal>builtin</literal>,
then <replaceable>locale</replaceable> must be specified and set to
either <literal>C</literal> or <literal>C.UTF-8</literal>.
either <literal>C</literal>, <literal>C.UTF-8</literal> or
<literal>PG_UNICODE_FAST</literal>.
</para>
</listitem>
</varlistentry>

View File

@@ -168,7 +168,8 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
If <xref linkend="create-database-locale-provider"/> is
<literal>builtin</literal>, then <replaceable>locale</replaceable> or
<replaceable>builtin_locale</replaceable> must be specified and set to
either <literal>C</literal> or <literal>C.UTF-8</literal>.
either <literal>C</literal>, <literal>C.UTF-8</literal>, or
<literal>PG_UNICODE_FAST</literal>.
</para>
<tip>
<para>
@@ -233,7 +234,8 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
</para>
<para>
The locales available for the <literal>builtin</literal> provider are
<literal>C</literal> and <literal>C.UTF-8</literal>.
<literal>C</literal>, <literal>C.UTF-8</literal> and
<literal>PG_UNICODE_FAST</literal>.
</para>
</listitem>
</varlistentry>

View File

@@ -295,8 +295,8 @@ PostgreSQL documentation
<para>
If <option>--locale-provider</option> is <literal>builtin</literal>,
<option>--locale</option> or <option>--builtin-locale</option> must be
specified and set to <literal>C</literal> or
<literal>C.UTF-8</literal>.
specified and set to <literal>C</literal>, <literal>C.UTF-8</literal>
or <literal>PG_UNICODE_FAST</literal>.
</para>
</listitem>
</varlistentry>