1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-08 11:22:35 +03:00

MDEV-27009 Add UCA-14.0.0 collations

- Added one neutral and 22 tailored (language specific) collations based on
  Unicode Collation Algorithm version 14.0.0.

  Collations were added for Unicode character sets
  utf8mb3, utf8mb4, ucs2, utf16, utf32.

  Every tailoring was added with four accent and case
  sensitivity flag combinations, e.g:

  * utf8mb4_uca1400_swedish_as_cs
  * utf8mb4_uca1400_swedish_as_ci
  * utf8mb4_uca1400_swedish_ai_cs
  * utf8mb4_uca1400_swedish_ai_ci

  and their _nopad_ variants:

  * utf8mb4_uca1400_swedish_nopad_as_cs
  * utf8mb4_uca1400_swedish_nopad_as_ci
  * utf8mb4_uca1400_swedish_nopad_ai_cs
  * utf8mb4_uca1400_swedish_nopad_ai_ci

- Introducing a conception of contextually typed named collations:

  CREATE DATABASE db1 CHARACTER SET utf8mb4;
  CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci);

  The idea is that there is no a need to specify the character set prefix
  in the new collation names. It's enough to type just the suffix
  "uca1400_as_ci". The character set is taken from the context.

  In the above example script the context character set is utf8mb4.
  So the CREATE TABLE will make a column with the collation
  utf8mb4_uca1400_as_ci.

  Short collations names can be used in any parts of the SQL syntax
  where the COLLATE clause is understood.

- New collations are displayed only one time
  (without character set combinations) by these statements:

     SELECT * FROM INFORMATION_SCHEMA.COLLATIONS;
     SHOW COLLATION;

  For example, all these collations:
  - utf8mb3_uca1400_swedish_as_ci
  - utf8mb4_uca1400_swedish_as_ci
  - ucs2_uca1400_swedish_as_ci
  - utf16_uca1400_swedish_as_ci
  - utf32_uca1400_swedish_as_ci
  have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION,
  with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix
  without the character set name:

SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';

+-----------------------+
| COLLATION_NAME        |
+-----------------------+
| uca1400_swedish_as_ci |
+-----------------------+

  Note, the behaviour of old collations did not change.
  Non-unicode collations (e.g. latin1_swedish_ci) and
  old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci)
  are still displayed with the character set prefix, as before.

- The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed.

  The NOT NULL constraint was removed from these columns:
  - CHARACTER_SET_NAME
  - ID
  - IS_DEFAULT
  and from the corresponding columns in SHOW COLLATION.

  For example:

SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';
+-----------------------+--------------------+------+------------+
| COLLATION_NAME        | CHARACTER_SET_NAME | ID   | IS_DEFAULT |
+-----------------------+--------------------+------+------------+
| uca1400_swedish_as_ci | NULL               | NULL | NULL       |
+-----------------------+--------------------+------+------------+

  The NULL value in these columns now means that the collation
  is applicable to multiple character sets.
  The behavioir of old collations did not change.
  Make sure your client programs can handle NULL values in these columns.

- The structure of the table
  INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed.

  Three new NOT NULL columns were added:
  - FULL_COLLATION_NAME
  - ID
  - IS_DEFAULT

  New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY.
  The column COLLATION_NAME contains the collation name without the character
  set prefix. The column FULL_COLLATION_NAME contains the collation name with
  the character set prefix.

  Old collations have full collation name in both FULL_COLLATION_NAME and
  COLLATION_NAME.

SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4|latin1).*swedish.*ci$';
+-----------------------------+-------------------------------------+--------------------+------+------------+
| COLLATION_NAME              | FULL_COLLATION_NAME                 | CHARACTER_SET_NAME | ID   | IS_DEFAULT |
+-----------------------------+-------------------------------------+--------------------+------+------------+
| latin1_swedish_ci           | latin1_swedish_ci                   | latin1             |    8 | Yes        |
| latin1_swedish_nopad_ci     | latin1_swedish_nopad_ci             | latin1             | 1032 |            |
| utf8mb4_swedish_ci          | utf8mb4_swedish_ci                  | utf8mb4            |  232 |            |
| uca1400_swedish_ai_ci       | utf8mb4_uca1400_swedish_ai_ci       | utf8mb4            | 2368 |            |
| uca1400_swedish_as_ci       | utf8mb4_uca1400_swedish_as_ci       | utf8mb4            | 2370 |            |
| uca1400_swedish_nopad_ai_ci | utf8mb4_uca1400_swedish_nopad_ai_ci | utf8mb4            | 2372 |            |
| uca1400_swedish_nopad_as_ci | utf8mb4_uca1400_swedish_nopad_as_ci | utf8mb4            | 2374 |            |
+-----------------------------+-------------------------------------+--------------------+------+------------+

- Other INFORMATION_SCHEMA queries:

  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS;
  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS;
  SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES;
  SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS;

  display full collation names, including character sets prefix,
  for all collations, including new collations.

  Corresponding SHOW commands also display full collation names
  in collation related columns:

  SHOW CREATE TABLE t1;
  SHOW CREATE DATABASE db1;
  SHOW TABLE STATUS;
  SHOW CREATE FUNCTION f1;
  SHOW CREATE PROCEDURE p1;
  SHOW CREATE EVENT ev1;
  SHOW CREATE TRIGGER tr1;
  SHOW CREATE VIEW;

  These INFORMATION_SCHEMA queries and SHOW statements may change in
  the future, to display show collation names.
This commit is contained in:
Alexander Barkov
2021-11-28 16:55:15 +04:00
committed by Oleksandr Byelkin
parent 6bc10f8026
commit 133446828c
99 changed files with 46038 additions and 846 deletions

View File

@@ -224,6 +224,190 @@ utf8mb4_general_nopad_ci utf8mb4 1069 # #
utf8mb4_nopad_bin utf8mb4 1070 # #
utf8mb4_unicode_nopad_ci utf8mb4 1248 # #
utf8mb4_unicode_520_nopad_ci utf8mb4 1270 # #
uca1400_ai_ci NULL NULL NULL # #
uca1400_ai_cs NULL NULL NULL # #
uca1400_as_ci NULL NULL NULL # #
uca1400_as_cs NULL NULL NULL # #
uca1400_nopad_ai_ci NULL NULL NULL # #
uca1400_nopad_ai_cs NULL NULL NULL # #
uca1400_nopad_as_ci NULL NULL NULL # #
uca1400_nopad_as_cs NULL NULL NULL # #
uca1400_icelandic_ai_ci NULL NULL NULL # #
uca1400_icelandic_ai_cs NULL NULL NULL # #
uca1400_icelandic_as_ci NULL NULL NULL # #
uca1400_icelandic_as_cs NULL NULL NULL # #
uca1400_icelandic_nopad_ai_ci NULL NULL NULL # #
uca1400_icelandic_nopad_ai_cs NULL NULL NULL # #
uca1400_icelandic_nopad_as_ci NULL NULL NULL # #
uca1400_icelandic_nopad_as_cs NULL NULL NULL # #
uca1400_latvian_ai_ci NULL NULL NULL # #
uca1400_latvian_ai_cs NULL NULL NULL # #
uca1400_latvian_as_ci NULL NULL NULL # #
uca1400_latvian_as_cs NULL NULL NULL # #
uca1400_latvian_nopad_ai_ci NULL NULL NULL # #
uca1400_latvian_nopad_ai_cs NULL NULL NULL # #
uca1400_latvian_nopad_as_ci NULL NULL NULL # #
uca1400_latvian_nopad_as_cs NULL NULL NULL # #
uca1400_romanian_ai_ci NULL NULL NULL # #
uca1400_romanian_ai_cs NULL NULL NULL # #
uca1400_romanian_as_ci NULL NULL NULL # #
uca1400_romanian_as_cs NULL NULL NULL # #
uca1400_romanian_nopad_ai_ci NULL NULL NULL # #
uca1400_romanian_nopad_ai_cs NULL NULL NULL # #
uca1400_romanian_nopad_as_ci NULL NULL NULL # #
uca1400_romanian_nopad_as_cs NULL NULL NULL # #
uca1400_slovenian_ai_ci NULL NULL NULL # #
uca1400_slovenian_ai_cs NULL NULL NULL # #
uca1400_slovenian_as_ci NULL NULL NULL # #
uca1400_slovenian_as_cs NULL NULL NULL # #
uca1400_slovenian_nopad_ai_ci NULL NULL NULL # #
uca1400_slovenian_nopad_ai_cs NULL NULL NULL # #
uca1400_slovenian_nopad_as_ci NULL NULL NULL # #
uca1400_slovenian_nopad_as_cs NULL NULL NULL # #
uca1400_polish_ai_ci NULL NULL NULL # #
uca1400_polish_ai_cs NULL NULL NULL # #
uca1400_polish_as_ci NULL NULL NULL # #
uca1400_polish_as_cs NULL NULL NULL # #
uca1400_polish_nopad_ai_ci NULL NULL NULL # #
uca1400_polish_nopad_ai_cs NULL NULL NULL # #
uca1400_polish_nopad_as_ci NULL NULL NULL # #
uca1400_polish_nopad_as_cs NULL NULL NULL # #
uca1400_estonian_ai_ci NULL NULL NULL # #
uca1400_estonian_ai_cs NULL NULL NULL # #
uca1400_estonian_as_ci NULL NULL NULL # #
uca1400_estonian_as_cs NULL NULL NULL # #
uca1400_estonian_nopad_ai_ci NULL NULL NULL # #
uca1400_estonian_nopad_ai_cs NULL NULL NULL # #
uca1400_estonian_nopad_as_ci NULL NULL NULL # #
uca1400_estonian_nopad_as_cs NULL NULL NULL # #
uca1400_spanish_ai_ci NULL NULL NULL # #
uca1400_spanish_ai_cs NULL NULL NULL # #
uca1400_spanish_as_ci NULL NULL NULL # #
uca1400_spanish_as_cs NULL NULL NULL # #
uca1400_spanish_nopad_ai_ci NULL NULL NULL # #
uca1400_spanish_nopad_ai_cs NULL NULL NULL # #
uca1400_spanish_nopad_as_ci NULL NULL NULL # #
uca1400_spanish_nopad_as_cs NULL NULL NULL # #
uca1400_swedish_ai_ci NULL NULL NULL # #
uca1400_swedish_ai_cs NULL NULL NULL # #
uca1400_swedish_as_ci NULL NULL NULL # #
uca1400_swedish_as_cs NULL NULL NULL # #
uca1400_swedish_nopad_ai_ci NULL NULL NULL # #
uca1400_swedish_nopad_ai_cs NULL NULL NULL # #
uca1400_swedish_nopad_as_ci NULL NULL NULL # #
uca1400_swedish_nopad_as_cs NULL NULL NULL # #
uca1400_turkish_ai_ci NULL NULL NULL # #
uca1400_turkish_ai_cs NULL NULL NULL # #
uca1400_turkish_as_ci NULL NULL NULL # #
uca1400_turkish_as_cs NULL NULL NULL # #
uca1400_turkish_nopad_ai_ci NULL NULL NULL # #
uca1400_turkish_nopad_ai_cs NULL NULL NULL # #
uca1400_turkish_nopad_as_ci NULL NULL NULL # #
uca1400_turkish_nopad_as_cs NULL NULL NULL # #
uca1400_czech_ai_ci NULL NULL NULL # #
uca1400_czech_ai_cs NULL NULL NULL # #
uca1400_czech_as_ci NULL NULL NULL # #
uca1400_czech_as_cs NULL NULL NULL # #
uca1400_czech_nopad_ai_ci NULL NULL NULL # #
uca1400_czech_nopad_ai_cs NULL NULL NULL # #
uca1400_czech_nopad_as_ci NULL NULL NULL # #
uca1400_czech_nopad_as_cs NULL NULL NULL # #
uca1400_danish_ai_ci NULL NULL NULL # #
uca1400_danish_ai_cs NULL NULL NULL # #
uca1400_danish_as_ci NULL NULL NULL # #
uca1400_danish_as_cs NULL NULL NULL # #
uca1400_danish_nopad_ai_ci NULL NULL NULL # #
uca1400_danish_nopad_ai_cs NULL NULL NULL # #
uca1400_danish_nopad_as_ci NULL NULL NULL # #
uca1400_danish_nopad_as_cs NULL NULL NULL # #
uca1400_lithuanian_ai_ci NULL NULL NULL # #
uca1400_lithuanian_ai_cs NULL NULL NULL # #
uca1400_lithuanian_as_ci NULL NULL NULL # #
uca1400_lithuanian_as_cs NULL NULL NULL # #
uca1400_lithuanian_nopad_ai_ci NULL NULL NULL # #
uca1400_lithuanian_nopad_ai_cs NULL NULL NULL # #
uca1400_lithuanian_nopad_as_ci NULL NULL NULL # #
uca1400_lithuanian_nopad_as_cs NULL NULL NULL # #
uca1400_slovak_ai_ci NULL NULL NULL # #
uca1400_slovak_ai_cs NULL NULL NULL # #
uca1400_slovak_as_ci NULL NULL NULL # #
uca1400_slovak_as_cs NULL NULL NULL # #
uca1400_slovak_nopad_ai_ci NULL NULL NULL # #
uca1400_slovak_nopad_ai_cs NULL NULL NULL # #
uca1400_slovak_nopad_as_ci NULL NULL NULL # #
uca1400_slovak_nopad_as_cs NULL NULL NULL # #
uca1400_spanish2_ai_ci NULL NULL NULL # #
uca1400_spanish2_ai_cs NULL NULL NULL # #
uca1400_spanish2_as_ci NULL NULL NULL # #
uca1400_spanish2_as_cs NULL NULL NULL # #
uca1400_spanish2_nopad_ai_ci NULL NULL NULL # #
uca1400_spanish2_nopad_ai_cs NULL NULL NULL # #
uca1400_spanish2_nopad_as_ci NULL NULL NULL # #
uca1400_spanish2_nopad_as_cs NULL NULL NULL # #
uca1400_roman_ai_ci NULL NULL NULL # #
uca1400_roman_ai_cs NULL NULL NULL # #
uca1400_roman_as_ci NULL NULL NULL # #
uca1400_roman_as_cs NULL NULL NULL # #
uca1400_roman_nopad_ai_ci NULL NULL NULL # #
uca1400_roman_nopad_ai_cs NULL NULL NULL # #
uca1400_roman_nopad_as_ci NULL NULL NULL # #
uca1400_roman_nopad_as_cs NULL NULL NULL # #
uca1400_persian_ai_ci NULL NULL NULL # #
uca1400_persian_ai_cs NULL NULL NULL # #
uca1400_persian_as_ci NULL NULL NULL # #
uca1400_persian_as_cs NULL NULL NULL # #
uca1400_persian_nopad_ai_ci NULL NULL NULL # #
uca1400_persian_nopad_ai_cs NULL NULL NULL # #
uca1400_persian_nopad_as_ci NULL NULL NULL # #
uca1400_persian_nopad_as_cs NULL NULL NULL # #
uca1400_esperanto_ai_ci NULL NULL NULL # #
uca1400_esperanto_ai_cs NULL NULL NULL # #
uca1400_esperanto_as_ci NULL NULL NULL # #
uca1400_esperanto_as_cs NULL NULL NULL # #
uca1400_esperanto_nopad_ai_ci NULL NULL NULL # #
uca1400_esperanto_nopad_ai_cs NULL NULL NULL # #
uca1400_esperanto_nopad_as_ci NULL NULL NULL # #
uca1400_esperanto_nopad_as_cs NULL NULL NULL # #
uca1400_hungarian_ai_ci NULL NULL NULL # #
uca1400_hungarian_ai_cs NULL NULL NULL # #
uca1400_hungarian_as_ci NULL NULL NULL # #
uca1400_hungarian_as_cs NULL NULL NULL # #
uca1400_hungarian_nopad_ai_ci NULL NULL NULL # #
uca1400_hungarian_nopad_ai_cs NULL NULL NULL # #
uca1400_hungarian_nopad_as_ci NULL NULL NULL # #
uca1400_hungarian_nopad_as_cs NULL NULL NULL # #
uca1400_sinhala_ai_ci NULL NULL NULL # #
uca1400_sinhala_ai_cs NULL NULL NULL # #
uca1400_sinhala_as_ci NULL NULL NULL # #
uca1400_sinhala_as_cs NULL NULL NULL # #
uca1400_sinhala_nopad_ai_ci NULL NULL NULL # #
uca1400_sinhala_nopad_ai_cs NULL NULL NULL # #
uca1400_sinhala_nopad_as_ci NULL NULL NULL # #
uca1400_sinhala_nopad_as_cs NULL NULL NULL # #
uca1400_german2_ai_ci NULL NULL NULL # #
uca1400_german2_ai_cs NULL NULL NULL # #
uca1400_german2_as_ci NULL NULL NULL # #
uca1400_german2_as_cs NULL NULL NULL # #
uca1400_german2_nopad_ai_ci NULL NULL NULL # #
uca1400_german2_nopad_ai_cs NULL NULL NULL # #
uca1400_german2_nopad_as_ci NULL NULL NULL # #
uca1400_german2_nopad_as_cs NULL NULL NULL # #
uca1400_vietnamese_ai_ci NULL NULL NULL # #
uca1400_vietnamese_ai_cs NULL NULL NULL # #
uca1400_vietnamese_as_ci NULL NULL NULL # #
uca1400_vietnamese_as_cs NULL NULL NULL # #
uca1400_vietnamese_nopad_ai_ci NULL NULL NULL # #
uca1400_vietnamese_nopad_ai_cs NULL NULL NULL # #
uca1400_vietnamese_nopad_as_ci NULL NULL NULL # #
uca1400_vietnamese_nopad_as_cs NULL NULL NULL # #
uca1400_croatian_ai_ci NULL NULL NULL # #
uca1400_croatian_ai_cs NULL NULL NULL # #
uca1400_croatian_as_ci NULL NULL NULL # #
uca1400_croatian_as_cs NULL NULL NULL # #
uca1400_croatian_nopad_ai_ci NULL NULL NULL # #
uca1400_croatian_nopad_ai_cs NULL NULL NULL # #
uca1400_croatian_nopad_as_ci NULL NULL NULL # #
uca1400_croatian_nopad_as_cs NULL NULL NULL # #
cp1251_bulgarian_ci cp1251 14 # #
cp1251_ukrainian_ci cp1251 23 # #
cp1251_bin cp1251 50 # #