1
0
mirror of https://github.com/MariaDB/server.git synced 2025-04-18 21:44:20 +03:00

MDEV-36213 Doubled memory usage (11.4.4 <-> 11.4.5)

Fixing the code adding MySQL _0900_ collations as _uca1400_ aliases
not to perform deep initialization of the corresponding _uca1400_
collations.

Only basic initialization is now performed which allows to watch
these collations (both _0900_ and _uca1400_) in queries to
INFORMATION_SCHEMA tables COLLATIONS and
COLLATION_CHARACTER_SET_APPLICABILITY,
as well as in SHOW COLLATION statements.

Deep initialization is now performed only when a collation
(either the _0900_ alias or the corresponding  _uca1400_ collation)
is used for the very first time after the server startup.

Refactoring was done to maintain the code easier:
- most of the _uca1400_ code was moved from ctype-uca.c
  to a new file ctype-uca1400.c
- most of the _0900_ code was moved from type-uca.c
  to a new file ctype-uca0900.c

Change details:

- The original function add_alias_for_collation() added by the patch for
   "MDEV-20912 Add support for utf8mb4_0900_* collations in MariaDB Server"
  was removed from mysys/charset.c, as it had two two problems:

  a. it forced deep initialization of the _uca1400_ collations
     when adding _0900_ aliases for them at the server startup
     (the main reported problem)

  b. the collation initialization code in add_alias_for_collation()
     was related more to collations rather than to memory management,
     so /strings should be a better place for it than /mysys.

  The code from add_alias_for_collation() was split into separate functions.
  Cyclic dependency was removed. `#include <my_sys.h>` was removed
  from /strings/ctype-uca.c. Collations are now added using a callback
  function MY_CHARSET_LOADED::add_collation, like it is done for
  user collations defined in Index.xml. The code in /mysys sets
  MY_CHARSET_LOADED::add_collation to add_compiled_collation().

- The function compare_collations() was removed.
  A new virtual function was added into my_collation_handler_st instead:

    my_bool (*eq_collation)(CHARSET_INFO *self, CHARSET_INFO *other);

  because it is the collation handler who knows how to detect equal
  collations by comparing only some of CHARSET_INFO members without
  their deep initialization.

  Three implementations were added:
  - my_ci_eq_collation_uca() for UCA collations, it compares
    _0900_ collations as equal to their corresponding _uca1400_ collations.
  - my_ci_eq_collation_utf8mb4_bin(), it compares
    utf8mb4_nopad_bin and utf8mb4_0900_bin as equal.
  - my_ci_eq_collation_generic() - the default implementation,
    which compares all collations as not equal.

  A C++ wrapper CHARSET_INFO::eq_collations() was added.
  The code in /sql was changes to use the wrapper instead of
  the former calls for the removed function compare_collations().

- A part of add_alias_for_collation() was moved into a new function
  my_ci_alloc(). It allocates a memory for a new charset_info_st
  instance together with the collation name and the comment using a single
  MY_CHARSET_LOADER::once_alloc call, which points to my_once_alloc()
  in the server.

- A part of add_alias_for_collation() was moved into a new function
  my_ci_make_comment_for_alias(). It makes an "Alias for xxx" string,
  e.g. "Alias for utf8mb4_uca1400_swedish_ai_ci" in case of
  utf8mb4_sv_0900_ai_ci.

- A part of the code in create_tailoring() was moved to
  a new function my_uca1400_collation_get_initialized_shared_uca(),
  to reuse the code between _uca1400_ and _0900_ collations.

- A new function my_collation_id_is_mysql_uca0900() was added
  in addition to my_collation_id_is_mysql_uca1400().

- Functions to build collation names were added:
   my_uca0900_collation_build_name()
   my_uca1400_collation_build_name()

- A shared function function was added:

  my_bool
  my_uca1400_collation_alloc_and_init(MY_CHARSET_LOADER *loader,
                                      LEX_CSTRING name,
                                      LEX_CSTRING comment,
                                      const uca_collation_def_param_t *param,
                                      uint id)

  It's reused to add _uca1400_ and _0900_ collations, with basic
  initialization (without deep initialization).

- The function add_compiled_collation() changed its return type from
  void to int, to make it compatible with MY_CHARSET_LOADER::add_collation.

- Functions mysql_uca0900_collation_definition_add(),
  mysql_uca0900_utf8mb4_collation_definitions_add(),
  mysql_utf8mb4_0900_bin_add() were added into ctype-uca0900.c.
  They get MY_CHARSET_LOADER as a parameter.

- Functions my_uca1400_collation_definition_add(),
  my_uca1400_collation_definitions_add() were moved from
  charset-def.c to strings/ctype-uca1400.c.
  The latter now accepts MY_CHARSET_LOADER as the first parameter
  instead of initializing a MY_CHARSET_LOADER inside.

- init_compiled_charsets() now initializes a MY_CHARSET_LOADER
  variable and passes it to all functions adding collations:
  - mysql_utf8mb4_0900_collation_definitions_add()
  - mysql_uca0900_utf8mb4_collation_definitions_add()
  - mysql_utf8mb4_0900_bin_add()

- A new structure was added into ctype-uca.h:

  typedef struct uca_collation_def_param
  {
    my_cs_encoding_t cs_id;
    uint tailoring_id;
    uint nopad_flags;
    uint level_flags;
  } uca_collation_def_param_t;

  It simplifies reusing the code for _uca1400_ and _0900_ collations.

- The definition of MY_UCA1400_COLLATION_DEFINITION was
  moved from ctype-uca.c to ctype-uca1400.h, to reuse
  the code for _uca1400_ and _0900_ collations.

- The definitions of "MY_UCA_INFO my_uca_v1400" and
  "MY_UCA_INFO my_uca1400_info_tailored[][]" were moved from
  ctype-uca.c to ctype-uca1400.c.

- The definitions/declarations of:
  - mysql_0900_collation_start,
  - struct mysql_0900_to_mariadb_1400_mapping
  - mysql_0900_to_mariadb_1400_mapping
  - mysql_utf8mb4_0900_collation_definitions_add()
  were moved from ctype-uca.c to ctype-uca0900.c

- Functions
  my_uca1400_make_builtin_collation_id()
  my_uca1400_collation_definition_init()
  my_uca1400_collation_id_uca400_compat()
  my_ci_get_collation_name_uca1400_context()
  were moved from ctype-uca.c to ctype-uca1400.c and ctype-uca1400.h

- A part of my_uca1400_collation_definition_init()
  was moved into my_uca0520_builtin_collation_by_id(),
  to make functions smaller.
This commit is contained in:
Alexander Barkov 2025-03-22 12:45:13 +04:00
parent 0dad1458e7
commit 10c063f9f0
38 changed files with 1498 additions and 646 deletions

View File

@ -568,6 +568,22 @@ struct my_collation_handler_st
uint (*get_id)(CHARSET_INFO *cs, my_collation_id_type_t type);
LEX_CSTRING (*get_collation_name)(CHARSET_INFO *cs,
my_collation_name_mode_t mode);
/*
Check if two collations are equally defined, so DTCollation aggregation
code considers them as equal. This is useful for collation aliases,
e.g. for MySQL 0900 collation aliases for 1400 MariaDB collations.
For example, these queries work without raising "Illegal mix of collations":
SELECT _utf8mb4'a' COLLATE utf8mb4_uca1400_nopad_ai_ci =
_utf8mb4'a' COLLATE utf8mb4_0900_ai_ci;
SELECT ... WHERE column_with_collation_utf8mb4_uca1400_nopad_ai_ci =
column_with_collation_tf8mb4_0900_ai_ci;
@return 0 Different
@return 1 Identical
*/
my_bool (*eq_collation)(CHARSET_INFO *self, CHARSET_INFO *other);
};
@ -1110,6 +1126,19 @@ struct charset_info_st
{
return (coll->get_collation_name)(this, mode);
}
/*
Check if two collations are equally defined. For details
see the definition of eq_collation() in my_collation_handler_st.
@return 0 Different
@return 1 Identical
*/
my_bool eq_collation(CHARSET_INFO *rhs) const
{
return this == rhs || (coll->eq_collation)(this, rhs);
}
#endif /* __cplusplus */
};
@ -1693,7 +1722,6 @@ my_bool my_propagate_complex(CHARSET_INFO *cs, const uchar *str, size_t len);
uint my_ci_get_id_generic(CHARSET_INFO *cs, my_collation_id_type_t type);
LEX_CSTRING my_ci_get_collation_name_generic(CHARSET_INFO *cs,
my_collation_name_mode_t mode);
my_bool compare_collations(CHARSET_INFO *cs1, CHARSET_INFO *cs2);
typedef struct
{

View File

@ -1123,11 +1123,8 @@ static inline my_bool my_charset_same(CHARSET_INFO *cs1, CHARSET_INFO *cs2)
return (cs1->cs_name.str == cs2->cs_name.str);
}
extern my_bool init_compiled_charsets(myf flags);
extern void add_compiled_collation(struct charset_info_st *cs);
extern int add_compiled_collation(struct charset_info_st *cs);
extern void add_compiled_extra_collation(struct charset_info_st *cs);
extern my_bool add_alias_for_collation(LEX_CSTRING *collation_name,
uint collation_id,
LEX_CSTRING *alias, uint alias_id);
extern size_t escape_string_for_mysql(CHARSET_INFO *charset_info,
char *to, size_t to_length,
const char *from, size_t length,

View File

@ -199,6 +199,153 @@ CREATE OR REPLACE TABLE t1 (p int primary key auto_increment, a VARCHAR(10), key
alter table t1 modify a varchar(10) collate utf8mb4_uca1400_swedish_nopad_ai_ci, algorithm=nocopy;
drop table t1;
#
# Print protocol collation IDs for 0900 collations
# They should be known to libmariadb
# See libmariadb/libmariadb/ma_charset.c
#
FOR rec IN (SELECT COLLATION_NAME
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE CHARACTER_SET_NAME='utf8mb4'
AND COLLATION_NAME RLIKE '_0900_'
ORDER BY ID)
DO
EXECUTE IMMEDIATE CONCAT('SET NAMES utf8mb4 COLLATE ', rec.COLLATION_NAME);
SELECT rec.COLLATION_NAME;
END FOR;
$$
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 18 Y 0 0 255
utf8mb4_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 24 Y 0 0 256
utf8mb4_de_pb_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 257
utf8mb4_is_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 258
utf8mb4_lv_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 259
utf8mb4_ro_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 260
utf8mb4_sl_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 261
utf8mb4_pl_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 262
utf8mb4_et_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 263
utf8mb4_es_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 264
utf8mb4_sv_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 265
utf8mb4_tr_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 266
utf8mb4_cs_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 267
utf8mb4_da_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 268
utf8mb4_lt_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 269
utf8mb4_sk_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 26 Y 0 0 270
utf8mb4_es_trad_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 271
utf8mb4_la_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 273
utf8mb4_eo_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 274
utf8mb4_hu_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 275
utf8mb4_hr_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 277
utf8mb4_vi_0900_ai_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 18 Y 0 0 278
utf8mb4_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 24 Y 0 0 279
utf8mb4_de_pb_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 280
utf8mb4_is_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 281
utf8mb4_lv_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 282
utf8mb4_ro_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 283
utf8mb4_sl_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 284
utf8mb4_pl_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 285
utf8mb4_et_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 286
utf8mb4_es_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 287
utf8mb4_sv_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 288
utf8mb4_tr_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 289
utf8mb4_cs_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 290
utf8mb4_da_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 291
utf8mb4_lt_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 292
utf8mb4_sk_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 26 Y 0 0 293
utf8mb4_es_trad_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 294
utf8mb4_la_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 296
utf8mb4_eo_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 297
utf8mb4_hu_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 298
utf8mb4_hr_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 21 Y 0 0 300
utf8mb4_vi_0900_as_cs
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 18 Y 0 0 305
utf8mb4_0900_as_ci
Catalog Database Table Table_alias Column Column_alias Type Length Max length Is_null Flags Decimals Charsetnr
def COLLATION_NAME rec.COLLATION_NAME 253 256 16 Y 0 0 309
utf8mb4_0900_bin
#
# MDEV-36361 Wrong utf8mb4_0900_bin alias for utf8mb4_bin (should be utf8mb4_nopad_bin)
#
SELECT collation_name, id, comment

View File

@ -84,6 +84,32 @@ CREATE OR REPLACE TABLE t1 (p int primary key auto_increment, a VARCHAR(10), key
alter table t1 modify a varchar(10) collate utf8mb4_uca1400_swedish_nopad_ai_ci, algorithm=nocopy;
drop table t1;
--echo #
--echo # Print protocol collation IDs for 0900 collations
--echo # They should be known to libmariadb
--echo # See libmariadb/libmariadb/ma_charset.c
--echo #
--disable_column_names
--disable_ps_protocol
--enable_metadata
DELIMITER $$;
FOR rec IN (SELECT COLLATION_NAME
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE CHARACTER_SET_NAME='utf8mb4'
AND COLLATION_NAME RLIKE '_0900_'
ORDER BY ID)
DO
EXECUTE IMMEDIATE CONCAT('SET NAMES utf8mb4 COLLATE ', rec.COLLATION_NAME);
SELECT rec.COLLATION_NAME;
END FOR;
$$
DELIMITER ;$$
--disable_metadata
--enable_ps_protocol
--enable_column_names
--echo #
--echo # MDEV-36361 Wrong utf8mb4_0900_bin alias for utf8mb4_bin (should be utf8mb4_nopad_bin)
--echo #

View File

@ -0,0 +1 @@
--init_connect="set @a='ctype_utf8mb4_0900_mem - run a dedicated mariadbd for this test'"

View File

@ -0,0 +1,108 @@
#
# MDEV-36213 Doubled memory usage (11.4.4 <-> 11.4.5)
#
SET NAMES utf8mb4;
CREATE FUNCTION memory_used() RETURNS BIGINT RETURN
(SELECT variable_value
FROM information_schema.global_status
WHERE variable_name='memory_used');
CREATE PROCEDURE p1(cl VARCHAR(64))
BEGIN
DECLARE mem_before BIGINT;
DECLARE mem_after BIGINT;
DECLARE query TEXT DEFAULT CONCAT('SET @a= _utf8mb4 0x20 COLLATE ', cl);
SET mem_before= memory_used();
EXECUTE IMMEDIATE query;
SET mem_after= memory_used();
SELECT
CASE
WHEN mem_after-mem_before >= 1024*1024 THEN '>=1M'
ELSE '<1M'
END AS diff,
CONCAT(query,';') AS query;
END;
/
CREATE PROCEDURE p2(cl VARCHAR(64))
BEGIN
DECLARE mem_before BIGINT;
DECLARE mem_after BIGINT;
DECLARE query TEXT DEFAULT CONCAT(
'SELECT id, full_collation_name'
' FROM information_schema.collation_character_set_applicability'
' WHERE full_collation_name LIKE ''PATTERN'' ORDER BY id');
SET query= REPLACE(query, 'PATTERN', cl);
SELECT query;
SET mem_before= memory_used();
EXECUTE IMMEDIATE query;
SET mem_after=memory_used();
SELECT
CASE
WHEN mem_before-mem_after >= 1024*1024 THEN '>=1M'
ELSE '<1M'
END AS diff;
END;
/
#
# Initialize spanish2 collations, an UCA-14.0.0 collation goes first
#
>=1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_spanish2_ai_ci;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_spanish2_ai_cs;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_spanish2_as_ci;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_spanish2_as_cs;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_spanish2_nopad_ai_ci;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_spanish2_nopad_ai_cs;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_spanish2_nopad_as_ci;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_spanish2_nopad_as_cs;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_es_trad_0900_ai_ci;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_es_trad_0900_as_cs;
#
# I_S queries for initialized collations should not add memory
#
SELECT id, full_collation_name FROM information_schema.collation_character_set_applicability WHERE full_collation_name LIKE 'utf8mb4_uca1400_spanish2%' ORDER BY id
2416 utf8mb4_uca1400_spanish2_ai_ci
2417 utf8mb4_uca1400_spanish2_ai_cs
2418 utf8mb4_uca1400_spanish2_as_ci
2419 utf8mb4_uca1400_spanish2_as_cs
2420 utf8mb4_uca1400_spanish2_nopad_ai_ci
2421 utf8mb4_uca1400_spanish2_nopad_ai_cs
2422 utf8mb4_uca1400_spanish2_nopad_as_ci
2423 utf8mb4_uca1400_spanish2_nopad_as_cs
<1M
SELECT id, full_collation_name FROM information_schema.collation_character_set_applicability WHERE full_collation_name LIKE 'utf8mb4_%es_trad_0900%' ORDER BY id
270 utf8mb4_es_trad_0900_ai_ci
293 utf8mb4_es_trad_0900_as_cs
<1M
#
# I_S queries for not initialized collations should not add memory
#
SELECT id, full_collation_name FROM information_schema.collation_character_set_applicability WHERE full_collation_name LIKE 'utf8mb4_uca1400_german2%' ORDER BY id
2464 utf8mb4_uca1400_german2_ai_ci
2465 utf8mb4_uca1400_german2_ai_cs
2466 utf8mb4_uca1400_german2_as_ci
2467 utf8mb4_uca1400_german2_as_cs
2468 utf8mb4_uca1400_german2_nopad_ai_ci
2469 utf8mb4_uca1400_german2_nopad_ai_cs
2470 utf8mb4_uca1400_german2_nopad_as_ci
2471 utf8mb4_uca1400_german2_nopad_as_cs
<1M
SELECT id, full_collation_name FROM information_schema.collation_character_set_applicability WHERE full_collation_name LIKE 'utf8mb4_%de_pb_0900%' ORDER BY id
256 utf8mb4_de_pb_0900_ai_ci
279 utf8mb4_de_pb_0900_as_cs
<1M
#
# Initialize german2 collations, an UCA-9.0.0 alias goes first
#
>=1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_de_pb_0900_ai_ci;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_de_pb_0900_as_cs;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_german2_ai_ci;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_german2_ai_cs;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_german2_as_ci;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_german2_as_cs;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_german2_nopad_ai_ci;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_german2_nopad_ai_cs;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_german2_nopad_as_ci;
<1M SET @a= _utf8mb4 0x20 COLLATE utf8mb4_uca1400_german2_nopad_as_cs;
DROP PROCEDURE p2;
DROP PROCEDURE p1;
DROP FUNCTION memory_used;
# End of 11.4 tests

View File

@ -0,0 +1,112 @@
--echo #
--echo # MDEV-36213 Doubled memory usage (11.4.4 <-> 11.4.5)
--echo #
SET NAMES utf8mb4;
CREATE FUNCTION memory_used() RETURNS BIGINT RETURN
(SELECT variable_value
FROM information_schema.global_status
WHERE variable_name='memory_used');
DELIMITER /;
CREATE PROCEDURE p1(cl VARCHAR(64))
BEGIN
DECLARE mem_before BIGINT;
DECLARE mem_after BIGINT;
DECLARE query TEXT DEFAULT CONCAT('SET @a= _utf8mb4 0x20 COLLATE ', cl);
SET mem_before= memory_used();
EXECUTE IMMEDIATE query;
SET mem_after= memory_used();
SELECT
CASE
WHEN mem_after-mem_before >= 1024*1024 THEN '>=1M'
ELSE '<1M'
END AS diff,
CONCAT(query,';') AS query;
END;
/
CREATE PROCEDURE p2(cl VARCHAR(64))
BEGIN
DECLARE mem_before BIGINT;
DECLARE mem_after BIGINT;
DECLARE query TEXT DEFAULT CONCAT(
'SELECT id, full_collation_name'
' FROM information_schema.collation_character_set_applicability'
' WHERE full_collation_name LIKE ''PATTERN'' ORDER BY id');
SET query= REPLACE(query, 'PATTERN', cl);
SELECT query;
SET mem_before= memory_used();
EXECUTE IMMEDIATE query;
SET mem_after=memory_used();
SELECT
CASE
WHEN mem_before-mem_after >= 1024*1024 THEN '>=1M'
ELSE '<1M'
END AS diff;
END;
/
DELIMITER ;/
--disable_column_names
--disable_query_log
--echo #
--echo # Initialize spanish2 collations, an UCA-14.0.0 collation goes first
--echo #
CALL p1('utf8mb4_uca1400_spanish2_ai_ci');
CALL p1('utf8mb4_uca1400_spanish2_ai_cs');
CALL p1('utf8mb4_uca1400_spanish2_as_ci');
CALL p1('utf8mb4_uca1400_spanish2_as_cs');
CALL p1('utf8mb4_uca1400_spanish2_nopad_ai_ci');
CALL p1('utf8mb4_uca1400_spanish2_nopad_ai_cs');
CALL p1('utf8mb4_uca1400_spanish2_nopad_as_ci');
CALL p1('utf8mb4_uca1400_spanish2_nopad_as_cs');
CALL p1('utf8mb4_es_trad_0900_ai_ci');
CALL p1('utf8mb4_es_trad_0900_as_cs');
--echo #
--echo # I_S queries for initialized collations should not add memory
--echo #
CALL p2('utf8mb4_uca1400_spanish2%');
CALL p2('utf8mb4_%es_trad_0900%');
--echo #
--echo # I_S queries for not initialized collations should not add memory
--echo #
CALL p2('utf8mb4_uca1400_german2%');
CALL p2('utf8mb4_%de_pb_0900%');
--echo #
--echo # Initialize german2 collations, an UCA-9.0.0 alias goes first
--echo #
CALL p1('utf8mb4_de_pb_0900_ai_ci');
CALL p1('utf8mb4_de_pb_0900_as_cs');
CALL p1('utf8mb4_uca1400_german2_ai_ci');
CALL p1('utf8mb4_uca1400_german2_ai_cs');
CALL p1('utf8mb4_uca1400_german2_as_ci');
CALL p1('utf8mb4_uca1400_german2_as_cs');
CALL p1('utf8mb4_uca1400_german2_nopad_ai_ci');
CALL p1('utf8mb4_uca1400_german2_nopad_ai_cs');
CALL p1('utf8mb4_uca1400_german2_nopad_as_ci');
CALL p1('utf8mb4_uca1400_german2_nopad_as_cs');
--enable_query_log
--enable_column_names
DROP PROCEDURE p2;
DROP PROCEDURE p1;
DROP FUNCTION memory_used;
--echo # End of 11.4 tests

View File

@ -184,76 +184,10 @@ extern struct charset_info_st my_charset_utf8mb4_unicode_520_nopad_ci;
#endif /* HAVE_UCA_COLLATIONS */
static my_bool
my_uca1400_collation_definition_add(MY_CHARSET_LOADER *loader,
my_cs_encoding_t charset_id,
uint tailoring_id,
my_bool nopad,
my_bool secondary_level,
my_bool tertiary_level)
{
struct charset_info_st *tmp;
uint collation_id= my_uca1400_make_builtin_collation_id(charset_id,
tailoring_id,
nopad,
secondary_level,
tertiary_level);
if (!collation_id)
return FALSE;
if (!(tmp= (struct charset_info_st*)
my_once_alloc(sizeof(CHARSET_INFO),MYF(0))))
return TRUE;
if (my_uca1400_collation_definition_init(loader, tmp, collation_id))
return TRUE;
add_compiled_collation(tmp);
return FALSE;
}
static my_bool
my_uca1400_collation_definitions_add()
{
my_cs_encoding_t charset_id;
MY_CHARSET_LOADER loader;
my_charset_loader_init_mysys(&loader);
for (charset_id= (my_cs_encoding_t) 0;
charset_id <= (my_cs_encoding_t) MY_CS_ENCODING_LAST;
charset_id++)
{
uint tailoring_id;
for (tailoring_id= 0 ;
tailoring_id < MY_UCA1400_COLLATION_DEFINITION_COUNT;
tailoring_id++)
{
uint nopad;
for (nopad= 0; nopad < 2; nopad++)
{
uint secondary_level;
for (secondary_level= 0; secondary_level < 2; secondary_level++)
{
if (my_uca1400_collation_definition_add(&loader,
charset_id, tailoring_id,
(my_bool) nopad,
(my_bool) secondary_level,
FALSE))
return TRUE;
if (my_uca1400_collation_definition_add(&loader,
charset_id, tailoring_id,
(my_bool) nopad,
(my_bool) secondary_level,
TRUE))
return TRUE;
}
}
}
}
return FALSE;
}
my_bool init_compiled_charsets(myf flags __attribute__((unused)))
{
CHARSET_INFO *cs;
MY_CHARSET_LOADER loader;
add_compiled_collation(&my_charset_bin);
add_compiled_collation(&my_charset_filename);
@ -541,9 +475,22 @@ my_bool init_compiled_charsets(myf flags __attribute__((unused)))
for (cs=compiled_charsets; cs->coll_name.str; cs++)
add_compiled_extra_collation((struct charset_info_st *) cs);
if (my_uca1400_collation_definitions_add())
/*
my_charset_loader_init_mysys() initializes
MY_CHARSET_LOADER::add_collation to the function
add_collation() defined in charset.c
Let's reset it to add_compiled_collation().
*/
my_charset_loader_init_mysys(&loader);
loader.add_collation= add_compiled_collation;
if (my_uca1400_collation_definitions_add(&loader))
return TRUE;
if (mysql_utf8mb4_0900_collation_definitions_add())
if (mysql_uca0900_utf8mb4_collation_definitions_add(&loader))
return TRUE;
if (mysql_utf8mb4_0900_bin_add(&loader))
return TRUE;
return FALSE;

View File

@ -597,7 +597,7 @@ CHARSET_INFO *default_charset_info = &my_charset_latin1;
All related character sets should share same cname
*/
void add_compiled_collation(struct charset_info_st *cs)
int add_compiled_collation(struct charset_info_st *cs)
{
DBUG_ASSERT(cs->number < array_elements(all_charsets));
all_charsets[cs->number]= cs;
@ -613,6 +613,7 @@ void add_compiled_collation(struct charset_info_st *cs)
DBUG_ASSERT(org->cs_name.length == strlen(cs->cs_name.str));
#endif
}
return 0;
}
@ -640,69 +641,6 @@ void add_compiled_extra_collation(struct charset_info_st *cs)
}
/*
Add an alias for a collation with an unique id
Used to add MySQL utf8mb4_0900 collations to MariaDB as an alias for the
corresponding utf8mb4_1400 collation
*/
my_bool add_alias_for_collation(LEX_CSTRING *collation_name, uint org_id,
LEX_CSTRING *alias, uint alias_id)
{
char *coll_name, *comment;
struct charset_info_st *new_ci;
CHARSET_INFO *org;
MY_CHARSET_LOADER loader;
char comment_buff[64+15];
size_t comment_length;
DBUG_ASSERT(all_charsets[org_id]);
if (!(org= all_charsets[org_id]))
return 1;
DBUG_ASSERT(!my_strcasecmp(&my_charset_latin1, org->coll_name.str,
collation_name->str));
#ifdef DEBUG_PRINT_ALIAS
fprintf(stderr, "alias: %s collation: %s org_id: %u\n",
alias->str, collation_name->str, org_id);
#endif
/*
We have to init the character set to ensure it is not changed after we copy
it.
*/
my_charset_loader_init_mysys(&loader);
if (my_ci_init_charset((struct charset_info_st*) org, &loader) ||
my_ci_init_collation((struct charset_info_st*) org, &loader) ||
(org->m_ctype &&
init_state_maps((struct charset_info_st*) org)))
return 1;
((struct charset_info_st*) org)->state|= MY_CS_READY;
comment_length= strxnmov(comment_buff, sizeof(comment_buff)-1,
"Alias for ", collation_name->str,
NullS) - comment_buff;
if (!(new_ci= ((struct charset_info_st*)
my_once_alloc(sizeof(CHARSET_INFO) +
alias->length + comment_length + 2,
MYF(MY_WME)))))
return 1;
coll_name= (char*) (new_ci+1);
comment= coll_name + alias->length +1;
memcpy((void*) new_ci, org, sizeof(CHARSET_INFO));
(new_ci->coll_name.str)= coll_name;
memcpy(coll_name, alias->str, alias->length+1);
memcpy(comment, comment_buff, comment_length+1);
new_ci->coll_name.length= alias->length;
new_ci->comment= comment;
new_ci->number= alias_id;
all_charsets[alias_id]= new_ci;
return 0;
}
static my_pthread_once_t charsets_initialized= MY_PTHREAD_ONCE_INIT;
static my_pthread_once_t charsets_template= MY_PTHREAD_ONCE_INIT;
@ -722,54 +660,6 @@ my_bool my_collation_is_known_id(uint id)
}
/*
Compare if two collations are identical.
They are identical if all slots are identical except collation name and
number. Note that alias collations are made by memcpy(), which means that
also the also padding in the structures are identical.
Note that this code assumes knowledge of the CHARSET_INFO structure.
Especially the place of number, cs_name, coll_name and tailoring.
Other option would have been to add a new member 'alias_collation'
into CHARSET_INFO where all identical collations would point to,
but that would have changed the CHARSET_INFO structure which would
have required a lot more changes.
@return 0 Identical
@return 1 Different
*/
my_bool compare_collations(CHARSET_INFO *cs1, CHARSET_INFO *cs2)
{
size_t length;
if (cs1 == cs2)
return 0;
/* Quick check to detect different collation */
if (cs1->cset != cs2->cset || cs1->coll != cs2->coll ||
cs1->uca != cs2->uca)
goto diff;
/* We don't compare character set number */
if (cs1->primary_number != cs2->primary_number)
goto diff;
if (cs1->binary_number != cs2->binary_number)
goto diff;
if (cs1->state != cs2->state)
goto diff;
/* Compare everything after comment_name */
length= sizeof(CHARSET_INFO) - (((char*) &cs1->tailoring) - (char*) cs1);
if (!memcmp(&cs1->tailoring, &cs2->tailoring, length))
return 0;
diff:
return 1;
}
/*
Collation use statistics functions do not lock
counters to avoid mutex contention. This can lose

View File

@ -2719,7 +2719,7 @@ bool Field_null::is_equal(const Column_definition &new_field) const
{
DBUG_ASSERT(!compression_method());
return (new_field.type_handler() == type_handler() &&
!compare_collations(new_field.charset, field_charset()) &&
new_field.charset->eq_collation(field_charset()) &&
new_field.length == max_display_length());
}
@ -7492,7 +7492,7 @@ bool Field_string::is_equal(const Column_definition &new_field) const
DBUG_ASSERT(!compression_method());
return (new_field.type_handler() == type_handler() &&
new_field.char_length == char_length() &&
!compare_collations(new_field.charset, field_charset()) &&
new_field.charset->eq_collation(field_charset()) &&
new_field.length == max_display_length());
}
@ -7516,7 +7516,7 @@ Field_longstr::cmp_to_string_with_same_collation(const Item_bool_func *cond,
{
return (!cmp_is_done_using_type_handler_of_this(cond, item) ?
Data_type_compatibility::INCOMPATIBLE_DATA_TYPE :
compare_collations(charset(), cond->compare_collation()) ?
!charset()->eq_collation(cond->compare_collation()) ?
Data_type_compatibility::INCOMPATIBLE_COLLATION :
Data_type_compatibility::OK);
}
@ -7528,7 +7528,7 @@ Field_longstr::cmp_to_string_with_stricter_collation(const Item_bool_func *cond,
{
return (!cmp_is_done_using_type_handler_of_this(cond, item) ?
Data_type_compatibility::INCOMPATIBLE_DATA_TYPE :
(compare_collations(charset(), cond->compare_collation()) &&
(!charset()->eq_collation(cond->compare_collation()) &&
!(cond->compare_collation()->state & MY_CS_BINSORT) &&
!Utf8_narrow::should_do_narrowing(this, cond->compare_collation())) ?
Data_type_compatibility::INCOMPATIBLE_COLLATION :
@ -8447,7 +8447,7 @@ bool Field_varstring::is_equal(const Column_definition &new_field) const
new_field.length == field_length &&
new_field.char_length == char_length() &&
!new_field.compression_method() == !compression_method() &&
!compare_collations(new_field.charset, field_charset()));
new_field.charset->eq_collation(field_charset()));
}
@ -8714,7 +8714,7 @@ uint32 Field_blob::get_length(const uchar *pos, uint packlength_arg) const
*/
int Field_blob::copy_value(Field_blob *from)
{
DBUG_ASSERT(!compare_collations(field_charset(), from->charset()));
DBUG_ASSERT(field_charset()->eq_collation(from->charset()));
DBUG_ASSERT(!compression_method() == !from->compression_method());
int rc= 0;
uint32 length= from->get_length();
@ -9248,7 +9248,7 @@ bool Field_blob::is_equal(const Column_definition &new_field) const
return (new_field.type_handler() == type_handler() &&
!new_field.compression_method() == !compression_method() &&
new_field.pack_length == pack_length() &&
!compare_collations(new_field.charset, field_charset()));
new_field.charset->eq_collation(field_charset()));
}
@ -9746,7 +9746,7 @@ bool Field_enum::is_equal(const Column_definition &new_field) const
type, charset and have the same underlying length.
*/
if (new_field.type_handler() != type_handler() ||
compare_collations(new_field.charset, field_charset()) ||
!new_field.charset->eq_collation(field_charset()) ||
new_field.pack_length != pack_length())
return false;
@ -9853,7 +9853,7 @@ Field_enum::can_optimize_range_or_keypart_ref(const Item_bool_func *cond,
case REAL_RESULT:
return Data_type_compatibility::OK;
case STRING_RESULT:
return (!compare_collations(charset(), cond->compare_collation()) ?
return (charset()->eq_collation(cond->compare_collation()) ?
Data_type_compatibility::OK :
Data_type_compatibility::INCOMPATIBLE_COLLATION);
case ROW_RESULT:

View File

@ -2568,7 +2568,7 @@ bool DTCollation::aggregate(const DTCollation &dt, uint flags)
}
else
{
if (!compare_collations(collation, dt.collation))
if (collation->eq_collation(dt.collation))
{
/* Do nothing */
}

View File

@ -20,7 +20,8 @@ ${CMAKE_BINARY_DIR}/strings
SET(STRINGS_SOURCES bchange.c bmove_upp.c ctype-big5.c ctype-bin.c ctype-cp932.c
ctype-czech.c ctype-euc_kr.c ctype-eucjpms.c ctype-extra.c ctype-gb2312.c ctype-gbk.c
ctype-latin1.c ctype-mb.c ctype-simple.c ctype-sjis.c ctype-tis620.c ctype-uca.c
ctype-latin1.c ctype-mb.c ctype-simple.c ctype-sjis.c ctype-tis620.c
ctype-uca.c ctype-uca0900.c ctype-uca1400.c
ctype-ucs2.c ctype-ujis.c ctype-utf8.c ctype-win1250ch.c ctype.c decimal.c dtoa.c int2str.c
ctype-unidata.c
is_prefix.c llstr.c longlong2str.c my_strtoll10.c my_vsnprintf.c

View File

@ -6730,7 +6730,8 @@ static MY_COLLATION_HANDLER my_collation_handler_big5_chinese_ci=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -6751,7 +6752,8 @@ static MY_COLLATION_HANDLER my_collation_handler_big5_bin=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -6772,7 +6774,8 @@ static MY_COLLATION_HANDLER my_collation_handler_big5_chinese_nopad_ci=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -6793,7 +6796,8 @@ static MY_COLLATION_HANDLER my_collation_handler_big5_nopad_bin=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -529,7 +529,8 @@ MY_COLLATION_HANDLER my_collation_8bit_bin_handler =
my_min_str_8bit_simple,
my_max_str_8bit_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -550,7 +551,8 @@ MY_COLLATION_HANDLER my_collation_8bit_nopad_bin_handler =
my_min_str_8bit_simple_nopad,
my_max_str_8bit_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -571,7 +573,8 @@ static MY_COLLATION_HANDLER my_collation_binary_handler =
my_min_str_8bit_simple_nopad,
my_max_str_8bit_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -34687,7 +34687,8 @@ static MY_COLLATION_HANDLER my_collation_handler_cp932_japanese_ci=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -34708,7 +34709,8 @@ static MY_COLLATION_HANDLER my_collation_handler_cp932_bin=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -34729,7 +34731,8 @@ static MY_COLLATION_HANDLER my_collation_handler_cp932_japanese_nopad_ci=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -34750,7 +34753,8 @@ static MY_COLLATION_HANDLER my_collation_handler_cp932_nopad_bin=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -598,7 +598,8 @@ static MY_COLLATION_HANDLER my_collation_latin2_czech_cs_handler =
my_min_str_8bit_simple,
my_max_str_8bit_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
struct charset_info_st my_charset_latin2_czech_cs =

View File

@ -9977,7 +9977,8 @@ static MY_COLLATION_HANDLER my_collation_handler_euckr_korean_ci=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -9998,7 +9999,8 @@ static MY_COLLATION_HANDLER my_collation_handler_euckr_bin=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -10019,7 +10021,8 @@ static MY_COLLATION_HANDLER my_collation_handler_euckr_korean_nopad_ci=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -10040,7 +10043,8 @@ static MY_COLLATION_HANDLER my_collation_handler_euckr_nopad_bin=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -67515,7 +67515,8 @@ static MY_COLLATION_HANDLER my_collation_eucjpms_japanese_ci_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -67536,7 +67537,8 @@ static MY_COLLATION_HANDLER my_collation_eucjpms_bin_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -67557,7 +67559,8 @@ static MY_COLLATION_HANDLER my_collation_eucjpms_japanese_nopad_ci_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -67578,7 +67581,8 @@ static MY_COLLATION_HANDLER my_collation_eucjpms_nopad_bin_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -6381,7 +6381,8 @@ static MY_COLLATION_HANDLER my_collation_handler_gb2312_chinese_ci=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -6402,7 +6403,8 @@ static MY_COLLATION_HANDLER my_collation_handler_gb2312_bin=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -6423,7 +6425,8 @@ static MY_COLLATION_HANDLER my_collation_handler_gb2312_chinese_nopad_ci=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -6444,7 +6447,8 @@ static MY_COLLATION_HANDLER my_collation_handler_gb2312_nopad_bin=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -10663,7 +10663,8 @@ static MY_COLLATION_HANDLER my_collation_handler_gbk_chinese_ci=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -10684,7 +10685,8 @@ static MY_COLLATION_HANDLER my_collation_handler_gbk_bin=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -10705,7 +10707,8 @@ static MY_COLLATION_HANDLER my_collation_handler_gbk_chinese_nopad_ci=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -10726,7 +10729,8 @@ static MY_COLLATION_HANDLER my_collation_handler_gbk_nopad_bin=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
static MY_CHARSET_HANDLER my_charset_handler=

View File

@ -742,7 +742,8 @@ static MY_COLLATION_HANDLER my_collation_german2_ci_handler=
my_min_str_8bit_simple,
my_max_str_8bit_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -2192,7 +2192,8 @@ MY_COLLATION_HANDLER my_collation_8bit_simple_ci_handler =
my_min_str_8bit_simple,
my_max_str_8bit_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -2213,5 +2214,6 @@ MY_COLLATION_HANDLER my_collation_8bit_simple_nopad_ci_handler =
my_min_str_8bit_simple_nopad,
my_max_str_8bit_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -34075,7 +34075,8 @@ static MY_COLLATION_HANDLER my_collation_handler_sjis_japanese_ci=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -34096,7 +34097,8 @@ static MY_COLLATION_HANDLER my_collation_handler_sjis_bin=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -34117,7 +34119,8 @@ static MY_COLLATION_HANDLER my_collation_handler_sjis_japanese_nopad_ci=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -34138,7 +34141,8 @@ static MY_COLLATION_HANDLER my_collation_handler_sjis_nopad_bin=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -881,7 +881,8 @@ static MY_COLLATION_HANDLER my_collation_ci_handler =
my_min_str_8bit_simple,
my_max_str_8bit_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
static MY_COLLATION_HANDLER my_collation_nopad_ci_handler =
@ -901,7 +902,8 @@ static MY_COLLATION_HANDLER my_collation_nopad_ci_handler =
my_min_str_8bit_simple_nopad,
my_max_str_8bit_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
static MY_CHARSET_HANDLER my_charset_handler=

View File

@ -34,8 +34,8 @@
#include "strings_def.h"
#include <m_ctype.h>
#include <my_sys.h>
#include "ctype-uca.h"
#include "ctype-uca0520.h"
#include "ctype-unidata.h"
#include "my_bit.h"
@ -30214,78 +30214,6 @@ MY_UCA_INFO my_uca_v520=
};
#include "ctype-uca1400data.h"
static MY_UCA_INFO my_uca_v1400=
{
{
{
0x10FFFF, /* maxchar */
(uchar *) uca1400_length,
(uint16 **) uca1400_weight,
{ /* Contractions: */
array_elements(uca1400_contractions), /* nitems */
uca1400_contractions, /* item */
NULL /* flags */
},
0, /* levelno */
{0}, /* contraction_hash */
NULL /* booster */
},
{
0x10FFFF, /* maxchar */
(uchar *) uca1400_length_secondary,
(uint16 **) uca1400_weight_secondary,
{ /* Contractions: */
array_elements(uca1400_contractions_secondary), /* nitems */
uca1400_contractions_secondary, /* item */
NULL /* flags */
},
1, /* levelno */
{0}, /* contraction_hash */
NULL /* booster */
},
{
0x10FFFF, /* maxchar */
(uchar *) uca1400_length_tertiary,
(uint16 **) uca1400_weight_tertiary,
{ /* Contractions: */
array_elements(uca1400_contractions_tertiary), /* nitems */
uca1400_contractions_tertiary, /* item */
NULL /* flags */
},
2, /* levelno */
{0}, /* contraction_hash */
NULL /* booster */
}
},
uca1400_non_ignorable_first,
uca1400_non_ignorable_last,
uca1400_primary_ignorable_first,
uca1400_primary_ignorable_last,
uca1400_secondary_ignorable_first,
uca1400_secondary_ignorable_last,
uca1400_tertiary_ignorable_first,
uca1400_tertiary_ignorable_last,
0x0000, /* first_trailing */
0x0000, /* last_trailing */
uca1400_variable_first,
uca1400_variable_last,
/* Misc */
uca1400_version
};
/******************************************************/
/*
@ -31247,25 +31175,12 @@ static const char myanmar[]= "[shift-after-method expand]"
;
typedef struct my_uca1400_collation_definition_st
{
const char * tailoring;
const char * name;
uint16 id_utf8mb3;
uint16 id_utf8mb4;
uint16 id_ucs2;
uint16 id_utf16;
uint16 id_utf32;
} MY_UCA1400_COLLATION_DEFINITION;
/*
UCA1400 collation definitions in the order of their UCA400 counterparts,
with IDs of their closest UCA1400 counterparts, for character sets
utf8mb3, utf8mb4, ucs2, utf16, utf32.
*/
static MY_UCA1400_COLLATION_DEFINITION
MY_UCA1400_COLLATION_DEFINITION
my_uca1400_collation_definitions[MY_UCA1400_COLLATION_DEFINITION_COUNT]=
{
#define COLDEF(tl,name,id_utf8mb3,id_utf8mb4,id_ucs2,id_utf16,id_utf32) \
@ -31309,9 +31224,17 @@ my_uca1400_collation_definitions[MY_UCA1400_COLLATION_DEFINITION_COUNT]=
};
static MY_UCA_INFO
my_uca1400_info_tailored[MY_CS_ENCODING_LAST+1]
[MY_UCA1400_COLLATION_DEFINITION_COUNT];
static my_bool
my_ci_eq_collation_uca(CHARSET_INFO *a, CHARSET_INFO *b)
{
return a->cset == b->cset &&
a->coll == b->coll &&
a->uca == b->uca &&
a->casefold == b->casefold &&
(a->state & MY_CS_NOPAD) == (b->state & MY_CS_NOPAD) &&
a->levels_for_order == b->levels_for_order &&
a->tailoring == b->tailoring;
}
typedef struct my_uca_scanner_param_st
@ -34722,6 +34645,30 @@ my_uca_info_init(MY_CHARSET_LOADER *loader,
}
/*
Initialize (if needed) an element of the array my_uca1400_info_tailored[].
UCA1400 collations with equal character set and tailoring
(but with different level flags) share the same MY_UCA_INFO.
*/
static MY_UCA_INFO *
my_uca1400_collation_get_initialized_shared_uca(MY_CHARSET_LOADER *loader,
struct charset_info_st *cs,
MY_COLL_RULES *rules,
const MY_UCA_INFO *src_uca,
uint id)
{
my_cs_encoding_t enc= my_uca1400_collation_id_to_charset_id(id);
uint tailoring= my_uca1400_collation_id_to_tailoring_id(id);
MY_UCA_INFO *dst_uca= &my_uca1400_info_tailored[enc][tailoring];
DBUG_ASSERT(my_collation_id_is_uca1400(id));
if (!dst_uca->level[0].weights/*Check if already initialized*/ &&
(my_uca_info_init(loader, dst_uca, rules, cs, src_uca,
(1<<MY_UCA_WEIGHT_LEVELS)-1)))
return NULL; /* EOM or an error in rules */
return dst_uca;
}
/*
This function copies an UCS2 collation from
the default Unicode Collation Algorithm (UCA)
@ -34789,20 +34736,23 @@ create_tailoring(struct charset_info_st *cs,
my_ci_set_strength(cs, 1);
if (my_collation_id_is_uca1400(cs->number))
if (my_collation_id_is_mysql_uca0900(cs->number))
{
/*
UCA1400 collations with equal character set and tailoring
(but with different level flags) share the same MY_UCA_INFO.
*/
my_cs_encoding_t enc= my_uca1400_collation_id_to_charset_id(cs->number);
uint tailoring= my_uca1400_collation_id_to_tailoring_id(cs->number);
MY_UCA_INFO *dst_uca= &my_uca1400_info_tailored[enc][tailoring];
if (!dst_uca->level[0].weights &&
(rc= my_uca_info_init(loader, dst_uca, &rules, cs, src_uca,
(1<<MY_UCA_WEIGHT_LEVELS)-1)))
goto ex;
cs->uca= dst_uca;
uint id1400= mysql_0900_mapping[cs->number - mysql_0900_collation_start].
collation_id;
if (!(cs->uca= my_uca1400_collation_get_initialized_shared_uca(loader, cs,
&rules,
src_uca,
id1400)))
goto ex;
}
else if (my_collation_id_is_uca1400(cs->number))
{
if (!(cs->uca= my_uca1400_collation_get_initialized_shared_uca(loader, cs,
&rules,
src_uca,
cs->number)))
goto ex;
}
else
{
@ -39397,122 +39347,6 @@ struct charset_info_st my_charset_utf16_unicode_520_nopad_ci=
#endif /* HAVE_CHARSET_utf16 */
uint
my_uca1400_make_builtin_collation_id(my_cs_encoding_t charset_id,
uint tailoring_id,
my_bool nopad,
my_bool secondary_level,
my_bool tertiary_level)
{
if (!my_uca1400_collation_definitions[tailoring_id].tailoring)
return 0;
return MY_UCA1400_COLLATION_ID_POSSIBLE_MIN +
(charset_id << 8) +
(tailoring_id << 3) +
(nopad << 2) +
(secondary_level << 1) +
(tertiary_level << 0);
}
my_bool
my_uca1400_collation_definition_init(MY_CHARSET_LOADER *loader,
struct charset_info_st *dst,
uint id)
{
my_cs_encoding_t cs_id= my_uca1400_collation_id_to_charset_id(id);
uint tailoring_id= my_uca1400_collation_id_to_tailoring_id(id);
my_bool nopad= my_uca1400_collation_id_to_nopad_flag(id);
my_bool secondary_level= my_uca1400_collation_id_to_secondary_level_flag(id);
my_bool tertiary_level= my_uca1400_collation_id_to_tertiary_level_flag(id);
const MY_UCA1400_COLLATION_DEFINITION *def=
&my_uca1400_collation_definitions[tailoring_id];
char tmp[128], *coll_name;
size_t length;
switch (cs_id) {
case MY_CS_ENCODING_UTF8MB3:
*dst= nopad ? my_charset_utf8mb3_unicode_520_nopad_ci :
my_charset_utf8mb3_unicode_520_ci;
break;
case MY_CS_ENCODING_UTF8MB4:
*dst= nopad ? my_charset_utf8mb4_unicode_520_nopad_ci :
my_charset_utf8mb4_unicode_520_ci;
break;
#ifdef HAVE_CHARSET_ucs2
case MY_CS_ENCODING_UCS2:
*dst= nopad ? my_charset_ucs2_unicode_520_nopad_ci :
my_charset_ucs2_unicode_520_ci;
break;
#endif
#ifdef HAVE_CHARSET_utf16
case MY_CS_ENCODING_UTF16:
*dst= nopad ? my_charset_utf16_unicode_520_nopad_ci :
my_charset_utf16_unicode_520_ci;
break;
#endif
#ifdef HAVE_CHARSET_utf32
case MY_CS_ENCODING_UTF32:
*dst= nopad ? my_charset_utf32_unicode_520_nopad_ci :
my_charset_utf32_unicode_520_ci;
break;
#endif
}
dst->number= id;
dst->uca= &my_uca_v1400;
dst->tailoring= def->tailoring;
if (def->tailoring == turkish)
dst->casefold= &my_casefold_unicode1400tr;
else
dst->casefold= &my_casefold_unicode1400;
if (nopad)
dst->state|= MY_CS_NOPAD;
my_ci_set_level_flags(dst, (1 << MY_CS_LEVEL_BIT_PRIMARY) |
(secondary_level ?
1 << MY_CS_LEVEL_BIT_SECONDARY : 0) |
(tertiary_level ?
1 << MY_CS_LEVEL_BIT_TERTIARY : 0));
length= my_snprintf(tmp, sizeof(tmp), "%.*s_uca1400%s%s%s%s%s",
(int) dst->cs_name.length, dst->cs_name.str,
def->name[0] ? "_" : "",
def->name,
nopad ? "_nopad" : "",
secondary_level ? "_as" : "_ai",
tertiary_level ? "_cs" : "_ci");
if (!(coll_name= loader->once_alloc(length + 1)))
return TRUE;
strcpy(coll_name, tmp);
dst->coll_name.str= coll_name;
dst->coll_name.length= length;
return FALSE;
}
/*
Return UCA-4.0.0 compatible ID, e.g. for use in the protocol
with the old clients.
*/
static uint my_uca1400_collation_id_uca400_compat(uint id)
{
uint tlid= my_uca1400_collation_id_to_tailoring_id(id);
my_cs_encoding_t csid= my_uca1400_collation_id_to_charset_id(id);
MY_UCA1400_COLLATION_DEFINITION *def;
DBUG_ASSERT(my_collation_id_is_uca1400(id));
if (!(def= &my_uca1400_collation_definitions[tlid])->name)
return id;
switch (csid) {
case MY_CS_ENCODING_UTF8MB3: return def->id_utf8mb3;
case MY_CS_ENCODING_UTF8MB4: return def->id_utf8mb4;
case MY_CS_ENCODING_UCS2: return def->id_ucs2;
case MY_CS_ENCODING_UTF16: return def->id_utf16;
case MY_CS_ENCODING_UTF32: return def->id_utf32;
}
return id;
}
uint my_ci_get_id_uca(CHARSET_INFO *cs, my_collation_id_type_t type)
{
switch (type)
@ -39532,23 +39366,6 @@ uint my_ci_get_id_uca(CHARSET_INFO *cs, my_collation_id_type_t type)
}
LEX_CSTRING my_ci_get_collation_name_uca1400_context(CHARSET_INFO *cs)
{
LEX_CSTRING res;
DBUG_ASSERT(my_collation_id_is_uca1400(cs->number));
if (cs->coll_name.length <= cs->cs_name.length ||
cs->coll_name.str[cs->cs_name.length] != '_')
{
DBUG_ASSERT(0);
return cs->coll_name;
}
res.str= cs->coll_name.str + cs->cs_name.length + 1;
res.length= cs->coll_name.length - cs->cs_name.length - 1;
return res;
}
LEX_CSTRING my_ci_get_collation_name_uca(CHARSET_INFO *cs,
my_collation_name_mode_t mode)
{
@ -39565,137 +39382,4 @@ LEX_CSTRING my_ci_get_collation_name_uca(CHARSET_INFO *cs,
return cs->coll_name;
}
/*
Add support for MySQL 8.0 utf8mb4_0900_.. collations
The collation id's where collected from fprintf() in add_alias_for_collation()
*/
#define mysql_0900_collation_start 255
struct mysql_0900_to_mariadb_1400_mapping
{
const char *mysql_col_name, *mariadb_col_name, *case_sensitivity;
uint collation_id;
};
struct mysql_0900_to_mariadb_1400_mapping mysql_0900_mapping[]=
{
/* 255 Ascent insensitive, Case insensitive 'ai_ci' */
{"", "", "ai_ci", 2308},
{"de_pb", "german2", "ai_ci", 2468},
{"is", "icelandic", "ai_ci", 2316},
{"lv", "latvian", "ai_ci", 2324},
{"ro", "romanian", "ai_ci", 2332},
{"sl", "slovenian", "ai_ci", 2340},
{"pl", "polish", "ai_ci", 2348},
{"et", "estonian", "ai_ci", 2356},
{"es", "spanish", "ai_ci", 2364},
{"sv", "swedish", "ai_ci", 2372},
{"tr", "turkish", "ai_ci", 2380},
{"cs", "czech", "ai_ci", 2388},
{"da", "danish", "ai_ci", 2396},
{"lt", "lithuanian", "ai_ci", 2404},
{"sk", "slovak", "ai_ci", 2412},
{"es_trad", "spanish2", "ai_ci", 2420},
{"la", "roman", "ai_ci", 2428},
{"fa", NullS, "ai_ci", 0}, // Disabled in MySQL
{"eo", "esperanto", "ai_ci", 2444},
{"hu", "hungarian", "ai_ci", 2452},
{"hr", "croatian", "ai_ci", 2500},
{"si", NullS, "ai_ci", 0}, // Disabled in MySQL
{"vi", "vietnamese", "ai_ci", 2492},
/* 278 Ascent sensitive, Case sensitive 'as_cs' */
{"","", "as_cs", 2311},
{"de_pb", "german2", "as_cs", 2471},
{"is", "icelandic", "as_cs", 2319},
{"lv", "latvian", "as_cs", 2327},
{"ro", "romanian", "as_cs", 2335},
{"sl", "slovenian", "as_cs", 2343},
{"pl", "polish", "as_cs", 2351},
{"et", "estonian", "as_cs", 2359},
{"es", "spanish", "as_cs", 2367},
{"sv", "swedish", "as_cs", 2375},
{"tr", "turkish", "as_cs", 2383},
{"cs", "czech", "as_cs", 2391},
{"da", "danish", "as_cs", 2399},
{"lt", "lithuanian", "as_cs", 2407},
{"sk", "slovak", "as_cs", 2415},
{"es_trad", "spanish2", "as_cs", 2423},
{"la", "roman", "as_cs", 2431},
{"fa", NullS, "as_cs", 0}, // Disabled in MySQL
{"eo", "esperanto", "as_cs", 2447},
{"hu", "hungarian", "as_cs", 2455},
{"hr", "croatian", "as_cs", 2503},
{"si", NullS, "as_cs", 0}, // Disabled in MySQL
{"vi", "vietnamese", "as_cs", 2495},
{"", NullS, "as_cs", 0}, // Missing
{"", NullS, "as_cs", 0}, // Missing
{"_ja_0900_as_cs", NullS, "as_cs", 0}, // Not supported
{"_ja_0900_as_cs_ks", NullS, "as_cs", 0}, // Not supported
/* 305 Ascent-sensitive, Case insensitive 'as_ci' */
{"","", "as_ci", 2310},
{"ru", NullS, "ai_ci", 0}, // Not supported
{"ru", NullS, "as_cs", 0}, // Not supported
{"zh", NullS, "as_cs", 0}, // Not supported
{NullS, NullS, "", 0}
};
static LEX_CSTRING
mysql_utf8mb4_0900_bin= {STRING_WITH_LEN("utf8mb4_0900_bin")},
mariadb_utf8mb4_nopad_bin= {STRING_WITH_LEN("utf8mb4_nopad_bin")};
/*
Map mysql character sets to MariaDB using the same definition but with
with the MySQL collation name and id.
*/
my_bool mysql_utf8mb4_0900_collation_definitions_add()
{
uint id= mysql_0900_collation_start;
struct mysql_0900_to_mariadb_1400_mapping *map;
for (map= mysql_0900_mapping; map->mysql_col_name ; map++, id++)
{
if (map->mariadb_col_name) /* Supported collation */
{
size_t org_length, ali_length;
char original[64], alias[64];
LEX_CSTRING org_name, alias_name;
org_length= (strxnmov(original, sizeof(original)-1,
"utf8mb4_uca1400_",
map->mariadb_col_name,
(map->mariadb_col_name[0] ? "_" : ""),
"nopad_",
map->case_sensitivity,
NullS) - original);
ali_length= (strxnmov(alias, sizeof(alias)-1,
"utf8mb4_", map->mysql_col_name,
(map->mysql_col_name[0] ? "_" : ""),
"0900_",
map->case_sensitivity,
NullS) - alias);
org_name.str= original;
org_name.length= org_length;
alias_name.str= alias;
alias_name.length= ali_length;
if (add_alias_for_collation(&org_name, map->collation_id, &alias_name,
id))
return 1;
}
}
if (add_alias_for_collation(&mariadb_utf8mb4_nopad_bin, 1070,
&mysql_utf8mb4_0900_bin, 309))
return 1;
return 0;
}
#endif /* HAVE_UCA_COLLATIONS */

View File

@ -118,9 +118,18 @@ typedef enum my_cs_encoding_enum
#define MY_CS_ENCODING_LAST MY_CS_ENCODING_UTF32
#include "ctype-uca1400.h"
typedef struct uca_collation_def_param
{
my_cs_encoding_t cs_id;
uint tailoring_id;
uint nopad_flags;
uint level_flags;
} uca_collation_def_param_t;
#include "ctype-uca1400.h"
#include "ctype-uca0900.h"
static inline MY_UCA_IMPLICIT_WEIGHT
my_uca_implicit_weight_primary(uint version, my_wc_t code)
{

View File

@ -963,7 +963,8 @@ MY_COLLATION_HANDLER MY_FUNCTION_NAME(collation_handler)=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_uca,
my_ci_get_collation_name_uca
my_ci_get_collation_name_uca,
my_ci_eq_collation_uca
};
@ -989,7 +990,8 @@ MY_COLLATION_HANDLER MY_FUNCTION_NAME(collation_handler_nopad)=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_uca,
my_ci_get_collation_name_uca
my_ci_get_collation_name_uca,
my_ci_eq_collation_uca
};
@ -1013,7 +1015,8 @@ MY_COLLATION_HANDLER MY_FUNCTION_NAME(collation_handler_multilevel)=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_uca,
my_ci_get_collation_name_uca
my_ci_get_collation_name_uca,
my_ci_eq_collation_uca
};
@ -1037,7 +1040,8 @@ MY_COLLATION_HANDLER MY_FUNCTION_NAME(collation_handler_nopad_multilevel)=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_uca,
my_ci_get_collation_name_uca
my_ci_get_collation_name_uca,
my_ci_eq_collation_uca
};

69
strings/ctype-uca0520.h Normal file
View File

@ -0,0 +1,69 @@
#ifndef CTYPE_UCA_0520_H
#define CTYPE_UCA_0520_H
/* Copyright (c) 2025, MariaDB Corporation
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
License as published by the Free Software Foundation; version 2
of the License.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with this library; if not, write to the Free
Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
MA 02110-1335 USA */
extern struct charset_info_st my_charset_utf8mb3_unicode_520_nopad_ci;
extern struct charset_info_st my_charset_utf8mb3_unicode_520_ci;
extern struct charset_info_st my_charset_utf8mb4_unicode_520_nopad_ci;
extern struct charset_info_st my_charset_utf8mb4_unicode_520_ci;
extern struct charset_info_st my_charset_ucs2_unicode_520_nopad_ci;
extern struct charset_info_st my_charset_ucs2_unicode_520_ci;
extern struct charset_info_st my_charset_utf16_unicode_520_nopad_ci;
extern struct charset_info_st my_charset_utf16_unicode_520_ci;
extern struct charset_info_st my_charset_utf32_unicode_520_nopad_ci;
extern struct charset_info_st my_charset_utf32_unicode_520_ci;
extern struct charset_info_st my_charset_utf8mb4_turkish_uca_ci;
/*
Get a UCA-5.2.0 CHARSET_INFO using its character set ID and PAD flags.
Used to initialize UCA-14.0.0 collations.
*/
static inline
CHARSET_INFO *my_uca0520_builtin_collation_by_id(my_cs_encoding_t cs_id,
uint nopad_flags)
{
switch (cs_id) {
case MY_CS_ENCODING_UTF8MB3:
return nopad_flags ? &my_charset_utf8mb3_unicode_520_nopad_ci :
&my_charset_utf8mb3_unicode_520_ci;
case MY_CS_ENCODING_UTF8MB4:
return nopad_flags ? &my_charset_utf8mb4_unicode_520_nopad_ci :
&my_charset_utf8mb4_unicode_520_ci;
#ifdef HAVE_CHARSET_ucs2
case MY_CS_ENCODING_UCS2:
return nopad_flags ? &my_charset_ucs2_unicode_520_nopad_ci :
&my_charset_ucs2_unicode_520_ci;
#endif
#ifdef HAVE_CHARSET_utf16
case MY_CS_ENCODING_UTF16:
return nopad_flags ? &my_charset_utf16_unicode_520_nopad_ci :
&my_charset_utf16_unicode_520_ci;
#endif
#ifdef HAVE_CHARSET_utf32
case MY_CS_ENCODING_UTF32:
return nopad_flags ? &my_charset_utf32_unicode_520_nopad_ci :
&my_charset_utf32_unicode_520_ci;
#endif
}
return NULL;
}
#endif /* CTYPE_UCA_0520_H */

221
strings/ctype-uca0900.c Normal file
View File

@ -0,0 +1,221 @@
/* Copyright (c) 2025, MariaDB Corporation
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
License as published by the Free Software Foundation; version 2
of the License.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with this library; if not, write to the Free
Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
MA 02110-1335 USA */
#include "my_global.h"
#include "strings_def.h"
#include "ctype-uca.h"
struct mysql_0900_to_mariadb_1400_mapping
mysql_0900_mapping[mysql_0900_collation_num]=
{
/* 255 Ascent insensitive, Case insensitive 'ai_ci' */
{"", "", "ai_ci", 2308},
{"de_pb", "german2", "ai_ci", 2468},
{"is", "icelandic", "ai_ci", 2316},
{"lv", "latvian", "ai_ci", 2324},
{"ro", "romanian", "ai_ci", 2332},
{"sl", "slovenian", "ai_ci", 2340},
{"pl", "polish", "ai_ci", 2348},
{"et", "estonian", "ai_ci", 2356},
{"es", "spanish", "ai_ci", 2364},
{"sv", "swedish", "ai_ci", 2372},
{"tr", "turkish", "ai_ci", 2380},
{"cs", "czech", "ai_ci", 2388},
{"da", "danish", "ai_ci", 2396},
{"lt", "lithuanian", "ai_ci", 2404},
{"sk", "slovak", "ai_ci", 2412},
{"es_trad", "spanish2", "ai_ci", 2420},
{"la", "roman", "ai_ci", 2428},
{"fa", NullS, "ai_ci", 0}, // Disabled in MySQL
{"eo", "esperanto", "ai_ci", 2444},
{"hu", "hungarian", "ai_ci", 2452},
{"hr", "croatian", "ai_ci", 2500},
{"si", NullS, "ai_ci", 0}, // Disabled in MySQL
{"vi", "vietnamese", "ai_ci", 2492},
/* 278 Ascent sensitive, Case sensitive 'as_cs' */
{"","", "as_cs", 2311},
{"de_pb", "german2", "as_cs", 2471},
{"is", "icelandic", "as_cs", 2319},
{"lv", "latvian", "as_cs", 2327},
{"ro", "romanian", "as_cs", 2335},
{"sl", "slovenian", "as_cs", 2343},
{"pl", "polish", "as_cs", 2351},
{"et", "estonian", "as_cs", 2359},
{"es", "spanish", "as_cs", 2367},
{"sv", "swedish", "as_cs", 2375},
{"tr", "turkish", "as_cs", 2383},
{"cs", "czech", "as_cs", 2391},
{"da", "danish", "as_cs", 2399},
{"lt", "lithuanian", "as_cs", 2407},
{"sk", "slovak", "as_cs", 2415},
{"es_trad", "spanish2", "as_cs", 2423},
{"la", "roman", "as_cs", 2431},
{"fa", NullS, "as_cs", 0}, // Disabled in MySQL
{"eo", "esperanto", "as_cs", 2447},
{"hu", "hungarian", "as_cs", 2455},
{"hr", "croatian", "as_cs", 2503},
{"si", NullS, "as_cs", 0}, // Disabled in MySQL
{"vi", "vietnamese", "as_cs", 2495},
{"", NullS, "as_cs", 0}, // Missing
{"", NullS, "as_cs", 0}, // Missing
{"_ja_0900_as_cs", NullS, "as_cs", 0}, // Not supported
{"_ja_0900_as_cs_ks", NullS, "as_cs", 0}, // Not supported
/* 305 Ascent-sensitive, Case insensitive 'as_ci' */
{"","", "as_ci", 2310},
{"ru", NullS, "ai_ci", 0}, // Not supported
{"ru", NullS, "as_cs", 0}, // Not supported
{"zh", NullS, "as_cs", 0}, // Not supported
{NullS, NullS, "", 0}
};
static LEX_CSTRING
my_uca0900_collation_build_name(char *buffer, size_t buffer_size,
const char *cs_name,
const char *tailoring_name,
const char *sensitivity_suffix)
{
LEX_CSTRING res;
DBUG_ASSERT(buffer_size > 1);
res.str= buffer;
res.length= (strxnmov(buffer, buffer_size - 1,
cs_name, "_", tailoring_name,
(tailoring_name[0] ? "_" : ""),
"0900_",
sensitivity_suffix,
NullS) - buffer);
return res;
}
static LEX_CSTRING
my_ci_make_comment_for_alias(char *buffer, size_t buffer_size,
const char *srcname)
{
LEX_CSTRING res= {buffer, 0};
DBUG_ASSERT(buffer_size > 0);
res.length= strxnmov(buffer, buffer_size - 1, "Alias for ", srcname, NullS) -
buffer;
return res;
}
/*
Add a MySQL UCA-0900 collation as an alias for a MariaDB UCA-1400 collation.
*/
static my_bool
mysql_uca0900_collation_definition_add(MY_CHARSET_LOADER *loader,
const struct
mysql_0900_to_mariadb_1400_mapping *map,
uint alias_id)
{
char comment_buffer[MY_CS_COLLATION_NAME_SIZE + 15];
char alias_buffer[MY_CS_COLLATION_NAME_SIZE + 1];
char name1400_buffer[MY_CS_COLLATION_NAME_SIZE + 1];
LEX_CSTRING comment= {comment_buffer, 0};
LEX_CSTRING alias_name= {alias_buffer, 0};
LEX_CSTRING name1400= {name1400_buffer, 0};
LEX_CSTRING utf8mb4= {STRING_WITH_LEN("utf8mb4")};
uint id1400= map->collation_id;
uca_collation_def_param_t param= my_uca1400_collation_param_by_id(id1400);
const MY_UCA1400_COLLATION_DEFINITION *def1400=
&my_uca1400_collation_definitions[param.tailoring_id];
DBUG_ASSERT(my_collation_id_is_mysql_uca0900(alias_id));
alias_name= my_uca0900_collation_build_name(alias_buffer,
sizeof(alias_buffer),
"utf8mb4",
map->mysql_col_name,
map->case_sensitivity);
name1400= my_uca1400_collation_build_name(name1400_buffer,
sizeof(name1400_buffer),
&utf8mb4, def1400->name, &param);
comment= my_ci_make_comment_for_alias(comment_buffer, sizeof(comment_buffer),
name1400.str);
#ifdef DEBUG_PRINT_ALIAS
fprintf(stderr, "alias[%u] %-26s -> [%u] %s\n",
id, alias_name.str, id1400, name1400.str);
#endif
return my_uca1400_collation_alloc_and_init(loader, alias_name,
comment, &param, alias_id);
}
/*
Add support for MySQL 8.0 utf8mb4_0900_.. UCA collations.
The collation id's were collected from fprintf()
in mysql_uca0900_collation_definition_add().
Map mysql character sets to MariaDB using the same definition but
with the MySQL collation name and id.
*/
my_bool
mysql_uca0900_utf8mb4_collation_definitions_add(MY_CHARSET_LOADER *loader)
{
uint alias_id= mysql_0900_collation_start;
struct mysql_0900_to_mariadb_1400_mapping *map;
for (map= mysql_0900_mapping; map->mysql_col_name ; map++, alias_id++)
{
if (map->mariadb_col_name) /* Supported collation */
{
if (mysql_uca0900_collation_definition_add(loader, map, alias_id))
return TRUE;
}
}
return FALSE;
}
/*
Add MySQL utf8mb4_0900_bin collation as
an alias for MariaDB utf8mb4_nopad_bin.
*/
my_bool mysql_utf8mb4_0900_bin_add(MY_CHARSET_LOADER *loader)
{
CHARSET_INFO *src= &my_charset_utf8mb4_nopad_bin;
LEX_CSTRING alias_name= {STRING_WITH_LEN("utf8mb4_0900_bin")};
uint alias_id= 309;
char comment_buffer[MY_CS_COLLATION_NAME_SIZE+15];
LEX_CSTRING comment= my_ci_make_comment_for_alias(comment_buffer,
sizeof(comment_buffer),
src->coll_name.str);
struct charset_info_st *dst= my_ci_alloc(loader, alias_name, &alias_name,
comment, &comment);
if (!dst)
return TRUE;
*dst= *src;
dst->number= alias_id;
dst->coll_name= alias_name;
dst->comment= comment.str;
(loader->add_collation)(dst);
return FALSE;
}

47
strings/ctype-uca0900.h Normal file
View File

@ -0,0 +1,47 @@
#ifndef CTYPE_UCA_0900_H
#define CTYPE_UCA_0900_H
/* Copyright (c) 2025, MariaDB Corporation
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
License as published by the Free Software Foundation; version 2
of the License.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with this library; if not, write to the Free
Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
MA 02110-1335 USA */
#define mysql_0900_collation_start 255
#define mysql_0900_collation_end 308
#define mysql_0900_collation_num \
(mysql_0900_collation_end - mysql_0900_collation_start + 1 + 1/*End marker*/)
struct mysql_0900_to_mariadb_1400_mapping
{
const char *mysql_col_name, *mariadb_col_name, *case_sensitivity;
uint collation_id;
};
extern struct mysql_0900_to_mariadb_1400_mapping
mysql_0900_mapping[mysql_0900_collation_num];
static inline
my_bool my_collation_id_is_mysql_uca0900(uint id)
{
return id >= mysql_0900_collation_start &&
id <= mysql_0900_collation_end;
}
my_bool mysql_uca0900_utf8mb4_collation_definitions_add(MY_CHARSET_LOADER *ld);
my_bool mysql_utf8mb4_0900_bin_add(MY_CHARSET_LOADER *loader);
#endif /* CTYPE_UCA_0900_H */

363
strings/ctype-uca1400.c Normal file
View File

@ -0,0 +1,363 @@
/* Copyright (c) 2025, MariaDB Corporation
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public
License as published by the Free Software Foundation; version 2
of the License.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with this library; if not, write to the Free
Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
MA 02110-1335 USA */
#include "strings_def.h"
#include "m_ctype.h"
#include "ctype-uca.h"
#include "ctype-uca0520.h"
#include "ctype-unidata.h"
#include "ctype-uca1400data.h"
/*
Return UCA-4.0.0 compatible ID (known since MySQL-4.1),
e.g. for use in the protocol with the old clients.
*/
uint my_uca1400_collation_id_uca400_compat(uint id)
{
uint tlid= my_uca1400_collation_id_to_tailoring_id(id);
my_cs_encoding_t csid= my_uca1400_collation_id_to_charset_id(id);
const MY_UCA1400_COLLATION_DEFINITION *def;
DBUG_ASSERT(my_collation_id_is_uca1400(id));
if (!(def= &my_uca1400_collation_definitions[tlid])->name)
return id;
switch (csid) {
case MY_CS_ENCODING_UTF8MB3: return def->id_utf8mb3;
case MY_CS_ENCODING_UTF8MB4: return def->id_utf8mb4;
case MY_CS_ENCODING_UCS2: return def->id_ucs2;
case MY_CS_ENCODING_UTF16: return def->id_utf16;
case MY_CS_ENCODING_UTF32: return def->id_utf32;
}
return id;
}
/*
Get a short (without the character set prefix) collation name
of a UCA-14.0.0 collation, e.g.
utf8mb4_uca1400_swedish_ai_ci -> uca1400_swedish_ai_ci
*/
LEX_CSTRING my_ci_get_collation_name_uca1400_context(CHARSET_INFO *cs)
{
LEX_CSTRING res;
DBUG_ASSERT(my_collation_id_is_uca1400(cs->number));
if (cs->coll_name.length <= cs->cs_name.length ||
cs->coll_name.str[cs->cs_name.length] != '_')
{
DBUG_ASSERT(0);
return cs->coll_name; /* Something went wrong, return the full name. */
}
res.str= cs->coll_name.str + cs->cs_name.length + 1;
res.length= cs->coll_name.length - cs->cs_name.length - 1;
return res;
}
/*
A preliminary initialized data for a UCA-14.0.0 collation.
The goal is to have the "logical position" members initialized (see below).
Weight tables are initialized later, at create_tailoring() time.
*/
MY_UCA_INFO my_uca_v1400=
{
{
{
0x10FFFF, /* maxchar */
(uchar *) uca1400_length,
(uint16 **) uca1400_weight,
{ /* Contractions: */
array_elements(uca1400_contractions), /* nitems */
uca1400_contractions, /* item */
NULL /* flags */
},
0, /* levelno */
{0}, /* contraction_hash */
NULL /* booster */
},
{
0x10FFFF, /* maxchar */
(uchar *) uca1400_length_secondary,
(uint16 **) uca1400_weight_secondary,
{ /* Contractions: */
array_elements(uca1400_contractions_secondary), /* nitems */
uca1400_contractions_secondary, /* item */
NULL /* flags */
},
1, /* levelno */
{0}, /* contraction_hash */
NULL /* booster */
},
{
0x10FFFF, /* maxchar */
(uchar *) uca1400_length_tertiary,
(uint16 **) uca1400_weight_tertiary,
{ /* Contractions: */
array_elements(uca1400_contractions_tertiary), /* nitems */
uca1400_contractions_tertiary, /* item */
NULL /* flags */
},
2, /* levelno */
{0}, /* contraction_hash */
NULL /* booster */
}
},
/* Logical positions */
uca1400_non_ignorable_first,
uca1400_non_ignorable_last,
uca1400_primary_ignorable_first,
uca1400_primary_ignorable_last,
uca1400_secondary_ignorable_first,
uca1400_secondary_ignorable_last,
uca1400_tertiary_ignorable_first,
uca1400_tertiary_ignorable_last,
0x0000, /* first_trailing */
0x0000, /* last_trailing */
uca1400_variable_first,
uca1400_variable_last,
/* Misc */
uca1400_version
};
/*
An array of MY_UCA_INFO (sorting tables).
Collations having the same character set and tailoring
(but different pad and accent/case sensitivity flags)
share the same array element. Also, aliases for MySQL-8.0
UCA-9.0.0 collations share the same array element with the
corresponding UCA-14.0.0 MariaDB collations.
For example, all these collation share one element of the array:
- utf8mb4_uca1400_swedish_ai_ci
- utf8mb4_uca1400_swedish_ai_cs
- utf8mb4_uca1400_swedish_as_ci
- utf8mb4_uca1400_swedish_as_cs
- utf8mb4_uca1400_swedish_nopad_ai_ci
- utf8mb4_uca1400_swedish_nopad_ai_cs
- utf8mb4_uca1400_swedish_nopad_as_ci
- utf8mb4_uca1400_swedish_nopad_as_cs
- utf8mb4_sv_0900_ai_ci
- utf8mb4_sv_0900_as_cs
*/
MY_UCA_INFO
my_uca1400_info_tailored[MY_CS_ENCODING_LAST+1]
[MY_UCA1400_COLLATION_DEFINITION_COUNT];
/*
Make an UCA-14.0.0 collation ID using its properties.
*/
uint my_uca1400_make_builtin_collation_id(my_cs_encoding_t charset_id,
uint tailoring_id,
my_bool nopad,
my_bool secondary_level,
my_bool tertiary_level)
{
if (!my_uca1400_collation_definitions[tailoring_id].tailoring)
return 0;
return MY_UCA1400_COLLATION_ID_POSSIBLE_MIN +
(charset_id << 8) +
(tailoring_id << 3) +
(nopad << 2) +
(secondary_level << 1) +
(tertiary_level << 0);
}
/*
Make an UCA-14.0.0 full collation name as a concatenation of its
- Character set name
- UCA version
- Language rules (tailoring)
- pad characteristics
- accent sensitivity
- case sensitivity
e.g.: "utf8mb4" + "_uca1400" + "_swedish" + "_as" + "_ci"
*/
LEX_CSTRING
my_uca1400_collation_build_name(char *buffer, size_t buffer_size,
const LEX_CSTRING *cs_name,
const char *tailoring_name,
const uca_collation_def_param_t *prm)
{
LEX_CSTRING res;
res.str= buffer;
res.length=
my_snprintf(buffer, buffer_size, "%.*s_uca1400%s%s%s%s%s",
(int) cs_name->length, cs_name->str,
tailoring_name[0] ? "_" : "",
tailoring_name,
prm->nopad_flags ? "_nopad" : "",
prm->level_flags & (1<<MY_CS_LEVEL_BIT_SECONDARY) ? "_as" : "_ai",
prm->level_flags & (1<<MY_CS_LEVEL_BIT_TERTIARY) ? "_cs" : "_ci");
return res;
}
/*
For extra safety let's define and check a set of flags
which are not expected for UCA 1400 collations.
*/
static inline uint
uca1400_unexpected_flags()
{
return MY_CS_BINSORT|
MY_CS_PRIMARY|
MY_CS_PUREASCII|
MY_CS_LOWER_SORT;
}
/*
Perform a preliminary initialization of a charset_info_st instance.
It's enough for SHOW and INFORMATION_SCHEMA queries.
Deep initialization will be done later, when the collation is
used for the first time. See create_tailoring().
*/
static void
my_uca1400_collation_definition_init(MY_CHARSET_LOADER *loader,
struct charset_info_st *dst,
const uca_collation_def_param_t *param)
{
const MY_UCA1400_COLLATION_DEFINITION *def=
&my_uca1400_collation_definitions[param->tailoring_id];
/* Copy the entire charset_info_st from an in-compiled one. */
*dst= *my_uca0520_builtin_collation_by_id(param->cs_id, param->nopad_flags);
/* Now replace some members according to param */
DBUG_ASSERT((dst->state & uca1400_unexpected_flags()) == 0);
dst->uca= &my_uca_v1400;
dst->tailoring= def->tailoring;
if (def->tailoring == my_charset_utf8mb4_turkish_uca_ci.tailoring)
dst->casefold= &my_casefold_unicode1400tr;
else
dst->casefold= &my_casefold_unicode1400;
dst->state|= param->nopad_flags;
my_ci_set_level_flags(dst, param->level_flags);
}
/*
Allocate memory for a new charset_info_st instance together
with its name and comment.
Perform preliminary initialization, then add to the list
of available collations using MY_CHARSET_LOADER::add_collation.
*/
my_bool
my_uca1400_collation_alloc_and_init(MY_CHARSET_LOADER *loader,
LEX_CSTRING name,
LEX_CSTRING comment,
const uca_collation_def_param_t *param,
uint id)
{
struct charset_info_st *dst;
if (!(dst= my_ci_alloc(loader, name, &name, comment, &comment)))
return TRUE;
my_uca1400_collation_definition_init(loader, dst, param);
dst->number= id;
dst->coll_name= name;
dst->comment= comment.str;
return (loader->add_collation)(dst) != 0;
}
/*
Make an UCA-14.0.0 full collation name using its id,
then allocate and add the collation.
*/
static
my_bool my_uca1400_collation_definition_add(MY_CHARSET_LOADER *loader, uint id)
{
char coll_name_buffer[MY_CS_COLLATION_NAME_SIZE + 1];
LEX_CSTRING coll_name;
LEX_CSTRING comment= {"",0};
uca_collation_def_param_t param= my_uca1400_collation_param_by_id(id);
CHARSET_INFO *src= my_uca0520_builtin_collation_by_id(param.cs_id,
param.nopad_flags);
const MY_UCA1400_COLLATION_DEFINITION *def=
&my_uca1400_collation_definitions[param.tailoring_id];
coll_name= my_uca1400_collation_build_name(coll_name_buffer,
sizeof(coll_name_buffer),
&src->cs_name,
def->name,
&param);
return my_uca1400_collation_alloc_and_init(loader, coll_name, comment,
&param, id);
}
/*
Add UCA-14.0.0 collations for all combinations of:
- Unicode character sets (utf8mb3, utf8mb4, ucs2, utf16, utf32)
- language rules (tailorings)
- pad properties
- accent sensitivity
- case sensitivity
*/
my_bool my_uca1400_collation_definitions_add(MY_CHARSET_LOADER *loader)
{
my_cs_encoding_t charset_id;
for (charset_id= (my_cs_encoding_t) 0;
charset_id <= (my_cs_encoding_t) MY_CS_ENCODING_LAST;
charset_id++)
{
uint tailoring_id;
for (tailoring_id= 0 ;
tailoring_id < MY_UCA1400_COLLATION_DEFINITION_COUNT;
tailoring_id++)
{
my_bool nopad; /* PAD / NOPAD */
for (nopad= 0; nopad < 2; nopad++)
{
my_bool secondary_level; /* ai / as */
for (secondary_level= 0; secondary_level < 2; secondary_level++)
{
my_bool tertiary_level; /* ci / cs */
for (tertiary_level= 0; tertiary_level < 2; tertiary_level++)
{
uint id= my_uca1400_make_builtin_collation_id(charset_id,
tailoring_id,
nopad,
secondary_level,
tertiary_level);
if (id && my_uca1400_collation_definition_add(loader, id))
return TRUE;
}
}
}
}
}
return FALSE;
}

View File

@ -187,6 +187,19 @@ my_collation_id_is_uca1400(uint id)
id <= MY_UCA1400_COLLATION_ID_POSSIBLE_MAX);
}
typedef struct my_uca1400_collation_definition_st
{
const char * tailoring;
const char * name;
uint16 id_utf8mb3;
uint16 id_utf8mb4;
uint16 id_ucs2;
uint16 id_utf16;
uint16 id_utf32;
} MY_UCA1400_COLLATION_DEFINITION;
/*
UCA1400 collation ID:
@ -204,6 +217,7 @@ my_collation_id_is_uca1400(uint id)
static inline my_cs_encoding_t
my_uca1400_collation_id_to_charset_id(uint id)
{
DBUG_ASSERT(id);
return (my_cs_encoding_t) ((id >> 8) & 0x07);
}
@ -211,6 +225,7 @@ my_uca1400_collation_id_to_charset_id(uint id)
static inline uint
my_uca1400_collation_id_to_tailoring_id(uint id)
{
DBUG_ASSERT(id);
return (id >> 3) & 0x1F;
}
@ -218,21 +233,52 @@ my_uca1400_collation_id_to_tailoring_id(uint id)
static inline my_bool
my_uca1400_collation_id_to_nopad_flag(uint id)
{
DBUG_ASSERT(id);
return (my_bool) ((id >> 2) & 0x01);
}
static inline my_bool
my_uca1400_collation_id_to_secondary_level_flag(uint id)
{
DBUG_ASSERT(id);
return (my_bool) ((id >> 1) & 0x01);
}
static inline my_bool
my_uca1400_collation_id_to_tertiary_level_flag(uint id)
{
DBUG_ASSERT(id);
return (my_bool) ((id >> 0) & 0x01);
}
static inline uint
my_uca1400_collation_id_to_level_flags(uint id)
{
my_bool secondary_level, tertiary_level;
DBUG_ASSERT(id);
secondary_level= my_uca1400_collation_id_to_secondary_level_flag(id);
tertiary_level= my_uca1400_collation_id_to_tertiary_level_flag(id);
return (1 << MY_CS_LEVEL_BIT_PRIMARY) |
(secondary_level ? 1 << MY_CS_LEVEL_BIT_SECONDARY : 0) |
(tertiary_level ? 1 << MY_CS_LEVEL_BIT_TERTIARY : 0);
}
/*
Return an UCA-14.0.0 collation properties using its ID.
*/
static inline uca_collation_def_param_t
my_uca1400_collation_param_by_id(uint id)
{
uca_collation_def_param_t res;
DBUG_ASSERT(id);
res.cs_id= my_uca1400_collation_id_to_charset_id(id);
res.tailoring_id= my_uca1400_collation_id_to_tailoring_id(id);
res.nopad_flags= my_uca1400_collation_id_to_nopad_flag(id);
res.level_flags= my_uca1400_collation_id_to_level_flags(id);
return res;
}
uint
my_uca1400_make_builtin_collation_id(my_cs_encoding_t charset_id,
@ -241,13 +287,36 @@ my_uca1400_make_builtin_collation_id(my_cs_encoding_t charset_id,
my_bool secondary_level,
my_bool tertiary_level);
my_bool
my_uca1400_collation_definition_init(MY_CHARSET_LOADER *loader,
struct charset_info_st *dst,
uint collation_id);
LEX_CSTRING
my_uca1400_collation_build_name(char *buffer, size_t buffer_size,
const LEX_CSTRING *cs_name,
const char *tailoring_name,
const uca_collation_def_param_t *prm);
my_bool
my_uca1400_collation_alloc_and_init(MY_CHARSET_LOADER *loader,
LEX_CSTRING name,
LEX_CSTRING comment,
const uca_collation_def_param_t *param,
uint id);
LEX_CSTRING my_ci_get_collation_name_uca1400_context(CHARSET_INFO *cs);
uint my_uca1400_collation_id_uca400_compat(uint id);
my_bool my_uca1400_collation_definitions_add(MY_CHARSET_LOADER *loader);
/* Exported data */
#define MY_UCA1400_COLLATION_DEFINITION_COUNT 26
my_bool mysql_utf8mb4_0900_collation_definitions_add();
extern MY_UCA1400_COLLATION_DEFINITION
my_uca1400_collation_definitions[MY_UCA1400_COLLATION_DEFINITION_COUNT];
extern MY_UCA_INFO my_uca_v1400;
extern MY_UCA_INFO my_uca1400_info_tailored[MY_CS_ENCODING_LAST+1]
[MY_UCA1400_COLLATION_DEFINITION_COUNT];
#endif /* CTYPE_UCA_1400_H */

View File

@ -1505,7 +1505,8 @@ static MY_COLLATION_HANDLER my_collation_utf16_general_ci_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1526,7 +1527,8 @@ static MY_COLLATION_HANDLER my_collation_utf16_bin_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1547,7 +1549,8 @@ static MY_COLLATION_HANDLER my_collation_utf16_general_nopad_ci_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1568,7 +1571,8 @@ static MY_COLLATION_HANDLER my_collation_utf16_nopad_bin_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1858,7 +1862,8 @@ static MY_COLLATION_HANDLER my_collation_utf16le_general_ci_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1879,7 +1884,8 @@ static MY_COLLATION_HANDLER my_collation_utf16le_bin_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1900,7 +1906,8 @@ static MY_COLLATION_HANDLER my_collation_utf16le_general_nopad_ci_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1921,7 +1928,8 @@ static MY_COLLATION_HANDLER my_collation_utf16le_nopad_bin_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -2663,7 +2671,8 @@ static MY_COLLATION_HANDLER my_collation_utf32_general_ci_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -2684,7 +2693,8 @@ static MY_COLLATION_HANDLER my_collation_utf32_bin_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -2705,7 +2715,8 @@ static MY_COLLATION_HANDLER my_collation_utf32_general_nopad_ci_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -2726,7 +2737,8 @@ static MY_COLLATION_HANDLER my_collation_utf32_nopad_bin_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -3263,7 +3275,8 @@ static MY_COLLATION_HANDLER my_collation_ucs2_general_ci_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -3284,7 +3297,8 @@ static MY_COLLATION_HANDLER my_collation_ucs2_general_mysql500_ci_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -3305,7 +3319,8 @@ static MY_COLLATION_HANDLER my_collation_ucs2_bin_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -3326,7 +3341,8 @@ static MY_COLLATION_HANDLER my_collation_ucs2_general_nopad_ci_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -3347,7 +3363,8 @@ static MY_COLLATION_HANDLER my_collation_ucs2_nopad_bin_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -67261,7 +67261,8 @@ static MY_COLLATION_HANDLER my_collation_ujis_japanese_ci_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -67282,7 +67283,8 @@ static MY_COLLATION_HANDLER my_collation_ujis_bin_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -67303,7 +67305,8 @@ static MY_COLLATION_HANDLER my_collation_ujis_japanese_nopad_ci_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -67324,7 +67327,8 @@ static MY_COLLATION_HANDLER my_collation_ujis_nopad_bin_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -1119,7 +1119,8 @@ static MY_COLLATION_HANDLER my_collation_utf8mb3_general_ci_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1140,7 +1141,8 @@ static MY_COLLATION_HANDLER my_collation_utf8mb3_general_mysql500_ci_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1161,7 +1163,8 @@ static MY_COLLATION_HANDLER my_collation_utf8mb3_bin_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1182,7 +1185,8 @@ static MY_COLLATION_HANDLER my_collation_utf8mb3_general_nopad_ci_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1203,7 +1207,8 @@ static MY_COLLATION_HANDLER my_collation_utf8mb3_nopad_bin_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -1529,7 +1534,8 @@ static MY_COLLATION_HANDLER my_collation_cs_handler =
my_hash_sort_utf8mb3,
my_propagate_simple,
my_min_str_mb_simple,
my_max_str_mb_simple
my_max_str_mb_simple,
my_ci_eq_collation_generic
};
struct charset_info_st my_charset_utf8mb3_general_cs=
@ -2848,7 +2854,8 @@ static MY_COLLATION_HANDLER my_collation_filename_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
static MY_CHARSET_HANDLER my_charset_filename_handler=
@ -3393,6 +3400,19 @@ my_charlen_utf8mb4(CHARSET_INFO *cs __attribute__((unused)),
}
static my_bool
my_ci_eq_collation_utf8mb4_bin(CHARSET_INFO *a, CHARSET_INFO *b)
{
return a->cset == b->cset &&
a->coll == b->coll &&
a->uca == b->uca && a->uca == NULL &&
a->casefold == b->casefold &&
(a->state & MY_CS_NOPAD) == (b->state & MY_CS_NOPAD) &&
a->levels_for_order == b->levels_for_order &&
a->tailoring == b->tailoring && a->tailoring == NULL;
}
#define MY_FUNCTION_NAME(x) my_ ## x ## _utf8mb4
#define CHARLEN(cs,str,end) my_charlen_utf8mb4(cs,str,end)
#define DEFINE_WELL_FORMED_CHAR_LENGTH_USING_CHARLEN
@ -3475,7 +3495,8 @@ static MY_COLLATION_HANDLER my_collation_utf8mb4_general_ci_handler=
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -3496,7 +3517,8 @@ static MY_COLLATION_HANDLER my_collation_utf8mb4_bin_handler =
my_min_str_mb_simple,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_utf8mb4_bin
};
@ -3517,7 +3539,8 @@ static MY_COLLATION_HANDLER my_collation_utf8mb4_general_nopad_ci_handler=
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};
@ -3538,7 +3561,8 @@ static MY_COLLATION_HANDLER my_collation_utf8mb4_nopad_bin_handler =
my_min_str_mb_simple_nopad,
my_max_str_mb_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_utf8mb4_bin
};

View File

@ -690,7 +690,8 @@ static MY_COLLATION_HANDLER my_collation_czech_cs_handler =
my_min_str_8bit_simple,
my_max_str_8bit_simple,
my_ci_get_id_generic,
my_ci_get_collation_name_generic
my_ci_get_collation_name_generic,
my_ci_eq_collation_generic
};

View File

@ -1413,3 +1413,41 @@ uint my_casefold_multiply_2(CHARSET_INFO *cs)
{
return 2;
}
my_bool my_ci_eq_collation_generic(CHARSET_INFO *self, CHARSET_INFO *other)
{
return FALSE;
}
/*
Allocate a memory block for a new charset_info_st together with
its name and its comment in a single once_alloc() call.
Copy the name and the comment into the new block.
*/
struct charset_info_st *my_ci_alloc(MY_CHARSET_LOADER *loader,
const LEX_CSTRING name,
LEX_CSTRING *out_name,
const LEX_CSTRING comment,
LEX_CSTRING *out_comment)
{
size_t nbytes= sizeof(struct charset_info_st) +
name.length + comment.length + 2;
struct charset_info_st *csinfo;
char *dst;
if (!(csinfo= (struct charset_info_st*) (loader->once_alloc)(nbytes)))
return NULL;
dst= ((char*) csinfo) + sizeof(struct charset_info_st);
memcpy(dst, name.str, name.length + 1);
out_name->str= dst;
out_name->length= name.length;
dst+= name.length + 1;
memcpy(dst, comment.str, comment.length + 1);
out_comment->str= dst;
out_comment->length= comment.length;
return csinfo;
}

View File

@ -20,6 +20,7 @@
#undef DBUG_ASSERT_AS_PRINTF
#include <my_global.h> /* Define standard vars */
#include "m_string.h" /* Exernal definitions of string functions */
#include "m_ctype.h"
/*
We can't use the original DBUG_ASSERT() (which includes _db_flush())
@ -148,6 +149,13 @@ void my_ci_set_level_flags(struct charset_info_st *cs, uint flags);
uint my_casefold_multiply_1(CHARSET_INFO *cs);
uint my_casefold_multiply_2(CHARSET_INFO *cs);
my_bool my_ci_eq_collation_generic(CHARSET_INFO *self, CHARSET_INFO *other);
struct charset_info_st *my_ci_alloc(MY_CHARSET_LOADER *loader,
const LEX_CSTRING name,
LEX_CSTRING *out_name,
const LEX_CSTRING comment,
LEX_CSTRING *out_comment);
/* Some common character set names */
extern const char charset_name_latin2[];