diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 44e43503a61..63f7de5b438 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -515,7 +515,7 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
A collation object provided by libc maps to a
combination of LC_COLLATE and LC_CTYPE
- settings. (As
+ settings, as accepted by the setlocale() system library call. (As
the name would suggest, the main purpose of a collation is to set
LC_COLLATE, which controls the sort order. But
it is rarely necessary in practice to have an
@@ -640,21 +640,19 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
ICU collations
- Collations provided by ICU are created with names in BCP 47 language tag
+ With ICU, it is not sensible to enumerate all possible locale names. ICU
+ uses a particular naming system for locales, but there are many more ways
+ to name a locale than there are actually distinct locales.
+ initdb uses the ICU APIs to extract a set of distinct
+ locales to populate the initial set of collations. Collations provided by
+ ICU are created in the SQL environment with names in BCP 47 language tag
format, with a private use
extension -x-icu appended, to distinguish them from
- libc locales. So de-x-icu would be an example name.
+ libc locales.
- With ICU, it is not sensible to enumerate all possible locale names. ICU
- uses a particular naming system for locales, but there are many more ways
- to name a locale than there are actually distinct locales. (In fact, any
- string will be accepted as a locale name.)
- See for
- information on ICU locale naming. initdb uses the ICU
- APIs to extract a set of distinct locales to populate the initial set of
- collations. Here are some example collations that might be created:
+ Here are some example collations that might be created:
@@ -695,32 +693,104 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
will draw an error along the lines of collation "de-x-icu" for
encoding "WIN874" does not exist>.
+
+
+
+
+ Creating New Collation Objects
+
+
+ If the standard and predefined collations are not sufficient, users can
+ create their own collation objects using the SQL
+ command .
+
+
+
+ The standard and predefined collations are in the
+ schema pg_catalog, like all predefined objects.
+ User-defined collations should be created in user schemas. This also
+ ensures that they are saved by pg_dump.
+
+
+
+ libc collations
+
+
+ New libc collations can be created like this:
+
+CREATE COLLATION german (provider = libc, locale = 'de_DE');
+
+ The exact values that are acceptable for the locale
+ clause in this command depend on the operating system. On Unix-like
+ systems, the command locale -a will show a list.
+
+
+
+ Since the predefined libc collations already include all collations
+ defined in the operating system when the database instance is
+ initialized, it is not often necessary to manually create new ones.
+ Reasons might be if a different naming system is desired (in which case
+ see also ) or if the operating system has
+ been upgraded to provide new locale definitions (in which case see
+ also pg_import_system_collations()).
+
+
+
+
+ ICU collations
ICU allows collations to be customized beyond the basic language+country
set that is preloaded by initdb. Users are encouraged
to define their own collation objects that make use of these facilities to
- suit the sorting behavior to their requirements. Here are some examples:
+ suit the sorting behavior to their requirements.
+ See
+ and for
+ information on ICU locale naming. The set of acceptable names and
+ attributes depends on the particular ICU version.
+
+
+
+ Here are some examples:
- CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk')
+ CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');
+ CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');German collation with phone book collation type
-
-
-
-
- CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji')
-
- Root collation with Emoji collation type, per Unicode Technical Standard #51
+ The first example selects the ICU locale using a language
+ tag per BCP 47. The second example uses the traditional
+ ICU-specific locale syntax. The first style is preferred going
+ forward, but it is not supported by older ICU versions.
+
+
+ Note that you can name the collation objects in the SQL environment
+ anything you want. In this example, we follow the naming style that
+ the predefined collations use, which in turn also follow BCP 47, but
+ that is not required for user-defined collations.
- CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit')
+ CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');
+ CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');
+
+
+ Root collation with Emoji collation type, per Unicode Technical Standard #51
+
+
+ Observe how in the traditional ICU locale naming system, the root
+ locale is selected by an empty string.
+
+
+
+
+
+ CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit');
+ CREATE COLLATION digitslast (provider = icu, locale = 'en@colReorder=latn-digit');
Sort digits after Latin letters. (The default is digits before letters.)
@@ -729,7 +799,8 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
- CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper')
+ CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');
+ CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');
Sort upper-case letters before lower-case letters. (The default is
@@ -739,7 +810,8 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
- CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-latn-digit')
+ CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-latn-digit');
+ CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=latn-digit');
Combines both of the above options.
@@ -748,7 +820,8 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
- CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true')
+ CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');
+ CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');
Numeric ordering, sorts sequences of digits by their numeric value,
@@ -768,7 +841,8 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
repository.
The ICU Locale
Explorer can be used to check the details of a particular locale
- definition.
+ definition. The examples using the k* subtags require
+ at least ICU version 54.
@@ -779,10 +853,21 @@ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
strings that compare equal according to the collation but are not
byte-wise equal will be sorted according to their byte values.
-
-
-
+
+
+ By design, ICU will accept almost any string as a locale name and match
+ it to the closet locale it can provide, using the fallback procedure
+ described in its documentation. Thus, there will be no direct feedback
+ if a collation specification is composed using features that the given
+ ICU installation does not actually support. It is therefore recommended
+ to create application-level test cases to check that the collation
+ definitions satisfy one's requirements.
+
+
+
+
+ Copying Collations
@@ -796,13 +881,7 @@ CREATE COLLATION german FROM "de_DE";
CREATE COLLATION french FROM "fr-x-icu";
-
-
- The standard and predefined collations are in the
- schema pg_catalog, like all predefined objects.
- User-defined collations should be created in user schemas. This also
- ensures that they are saved by pg_dump.
-
+
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 2d3e050545c..f88758095f2 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -93,10 +93,7 @@ CREATE COLLATION [ IF NOT EXISTS ] name FROM
Use the specified operating system locale for
- the LC_COLLATE locale category. The locale
- must be applicable to the current database encoding.
- (See for the precise
- rules.)
+ the LC_COLLATE locale category.
@@ -107,10 +104,7 @@ CREATE COLLATION [ IF NOT EXISTS ] name FROM
Use the specified operating system locale for
- the LC_CTYPE locale category. The locale
- must be applicable to the current database encoding.
- (See for the precise
- rules.)
+ the LC_CTYPE locale category.
@@ -173,8 +167,13 @@ CREATE COLLATION [ IF NOT EXISTS ] name FROM
- See for more information about collation
- support in PostgreSQL.
+ See for more information on how to create collations.
+
+
+
+ When using the libc collation provider, the locale must
+ be applicable to the current database encoding.
+ See for the precise rules.
@@ -186,7 +185,14 @@ CREATE COLLATION [ IF NOT EXISTS ] name FROM fr_FR.utf8
(assuming the current database encoding is UTF8):
-CREATE COLLATION french (LOCALE = 'fr_FR.utf8');
+CREATE COLLATION french (locale = 'fr_FR.utf8');
+
+
+
+
+ To create a collation using the ICU provider using German phone book sort order:
+
+CREATE COLLATION german_phonebook (provider = icu, locale = 'de-u-co-phonebk');