1
0
mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00
Commit Graph

24 Commits

Author SHA1 Message Date
Alexander Barkov
cf644785e1 MDEV-36503 add Pad_attribute column to INFORMATION_SCHEMA.COLLATIONS
Adding a new column in INFORMATION_SCHEMA.COLLATIONS:

  PAD_ATTRIBUTE ENUM('PAD SPACE','NO PAD')

and a new column Pad_attribute into SHOW COLLATION.

The new column has been added after SORTLEN but before COMMENT.
This order is compatible with MySQL-8.0 order,
with the exception that MariaDB has an extra last column COMMENT:

MariaDB [test]> desc information_schema.collations;
+--------------------+----------------------------+------+-----+---------+-------+
| Field              | Type                       | Null | Key | Default | Extra |
+--------------------+----------------------------+------+-----+---------+-------+
| COLLATION_NAME     | varchar(64)                | NO   |     | NULL    |       |
| CHARACTER_SET_NAME | varchar(32)                | YES  |     | NULL    |       |
| ID                 | bigint(11)                 | YES  |     | NULL    |       |
| IS_DEFAULT         | varchar(3)                 | YES  |     | NULL    |       |
| IS_COMPILED        | varchar(3)                 | NO   |     | NULL    |       |
| SORTLEN            | bigint(3)                  | NO   |     | NULL    |       |
| PAD_ATTRIBUTE      | enum('PAD SPACE','NO PAD') | NO   |     | NULL    |       |
| COMMENT            | varchar(80)                | NO   |     | NULL    |       |
+--------------------+----------------------------+------+-----+---------+-------+

The new Pad_attribute in SHOW COLLATION has been added as the last column.
This is also compatible with MySQL:

MariaDB [test]> show collation like 'utf8mb4_bin';
+-------------+---------+------+---------+----------+---------+---------------+
| Collation   | Charset | Id   | Default | Compiled | Sortlen | Pad_attribute |
+-------------+---------+------+---------+----------+---------+---------------+
| utf8mb4_bin | utf8mb4 |   46 |         | Yes      |       1 | PAD SPACE     |
+-------------+---------+------+---------+----------+---------+---------------+
2025-05-19 17:07:18 +04:00
Sergei Golubchik
ba01c2aaf0 Merge branch '11.4' into 11.7
* rpl.rpl_system_versioning_partitions updated for MDEV-32188
* innodb.row_size_error_log_warnings_3 changed error for MDEV-33658
  (checks are done in a different order)
2025-02-06 16:46:36 +01:00
Sergei Golubchik
f1da9dbf5e MDEV-20912 fix test results
followup for 7fcaab7aaa
2025-01-15 09:51:19 +01:00
Alexander Barkov
36eba98817 MDEV-19123 Change default charset from latin1 to utf8mb4
Changing the default server character set from latin1 to utf8mb4.
2024-07-11 10:21:07 +04:00
Alexander Barkov
903b5d6a83 MDEV-25829 Change default Unicode collation to uca1400_ai_ci
Step#3 The main patch
2024-05-24 15:50:05 +04:00
Alexander Barkov
fd247cc21f MDEV-31340 Remove MY_COLLATION_HANDLER::strcasecmp()
This patch also fixes:
  MDEV-33050 Build-in schemas like oracle_schema are accent insensitive
  MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0
  MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0
  MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0
  MDEV-33088 Cannot create triggers in the database `MYSQL`
  MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0
  MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0
  MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0
  MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS
  MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0

- Removing the virtual function strnncoll() from MY_COLLATION_HANDLER

- Adding a wrapper function CHARSET_INFO::streq(), to compare
  two strings for equality. For now it calls strnncoll() internally.
  In the future it will turn into a virtual function.

- Adding new accent sensitive case insensitive collations:
    - utf8mb4_general1400_as_ci
    - utf8mb3_general1400_as_ci
  They implement accent sensitive case insensitive comparison.
  The weight of a character is equal to the code point of its
  upper case variant. These collations use Unicode-14.0.0 casefolding data.

  The result of
     my_charset_utf8mb3_general1400_as_ci.strcoll()
  is very close to the former
     my_charset_utf8mb3_general_ci.strcasecmp()

  There is only a difference in a couple dozen rare characters, because:
    - the switch from "tolower" to "toupper" comparison, to make
      utf8mb3_general1400_as_ci closer to utf8mb3_general_ci
    - the switch from Unicode-3.0.0 to Unicode-14.0.0
  This difference should be tolarable. See the list of affected
  characters in the MDEV description.

  Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters!
  Unlike utf8mb4_general_ci, it does not treat all BMP characters
  as equal.

- Adding classes representing names of the file based database objects:

    Lex_ident_db
    Lex_ident_table
    Lex_ident_trigger

  Their comparison collation depends on the underlying
  file system case sensitivity and on --lower-case-table-names
  and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci.

- Adding classes representing names of other database objects,
  whose names have case insensitive comparison style,
  using my_charset_utf8mb3_general1400_as_ci:

  Lex_ident_column
  Lex_ident_sys_var
  Lex_ident_user_var
  Lex_ident_sp_var
  Lex_ident_ps
  Lex_ident_i_s_table
  Lex_ident_window
  Lex_ident_func
  Lex_ident_partition
  Lex_ident_with_element
  Lex_ident_rpl_filter
  Lex_ident_master_info
  Lex_ident_host
  Lex_ident_locale
  Lex_ident_plugin
  Lex_ident_engine
  Lex_ident_server
  Lex_ident_savepoint
  Lex_ident_charset
  engine_option_value::Name

- All the mentioned Lex_ident_xxx classes implement a method streq():

  if (ident1.streq(ident2))
     do_equal();

  This method works as a wrapper for CHARSET_INFO::streq().

- Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name"
  in class members and in function/method parameters.

- Replacing all calls like
    system_charset_info->coll->strcasecmp(ident1, ident2)
  to
    ident1.streq(ident2)

- Taking advantage of the c++11 user defined literal operator
  for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h)
  data types. Use example:

  const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column;

  is now a shorter version of:

  const Lex_ident_column primary_key_name=
    Lex_ident_column({STRING_WITH_LEN("PRIMARY")});
2024-04-18 15:22:10 +04:00
Alexander Barkov
133446828c MDEV-27009 Add UCA-14.0.0 collations
- Added one neutral and 22 tailored (language specific) collations based on
  Unicode Collation Algorithm version 14.0.0.

  Collations were added for Unicode character sets
  utf8mb3, utf8mb4, ucs2, utf16, utf32.

  Every tailoring was added with four accent and case
  sensitivity flag combinations, e.g:

  * utf8mb4_uca1400_swedish_as_cs
  * utf8mb4_uca1400_swedish_as_ci
  * utf8mb4_uca1400_swedish_ai_cs
  * utf8mb4_uca1400_swedish_ai_ci

  and their _nopad_ variants:

  * utf8mb4_uca1400_swedish_nopad_as_cs
  * utf8mb4_uca1400_swedish_nopad_as_ci
  * utf8mb4_uca1400_swedish_nopad_ai_cs
  * utf8mb4_uca1400_swedish_nopad_ai_ci

- Introducing a conception of contextually typed named collations:

  CREATE DATABASE db1 CHARACTER SET utf8mb4;
  CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci);

  The idea is that there is no a need to specify the character set prefix
  in the new collation names. It's enough to type just the suffix
  "uca1400_as_ci". The character set is taken from the context.

  In the above example script the context character set is utf8mb4.
  So the CREATE TABLE will make a column with the collation
  utf8mb4_uca1400_as_ci.

  Short collations names can be used in any parts of the SQL syntax
  where the COLLATE clause is understood.

- New collations are displayed only one time
  (without character set combinations) by these statements:

     SELECT * FROM INFORMATION_SCHEMA.COLLATIONS;
     SHOW COLLATION;

  For example, all these collations:
  - utf8mb3_uca1400_swedish_as_ci
  - utf8mb4_uca1400_swedish_as_ci
  - ucs2_uca1400_swedish_as_ci
  - utf16_uca1400_swedish_as_ci
  - utf32_uca1400_swedish_as_ci
  have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION,
  with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix
  without the character set name:

SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';

+-----------------------+
| COLLATION_NAME        |
+-----------------------+
| uca1400_swedish_as_ci |
+-----------------------+

  Note, the behaviour of old collations did not change.
  Non-unicode collations (e.g. latin1_swedish_ci) and
  old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci)
  are still displayed with the character set prefix, as before.

- The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed.

  The NOT NULL constraint was removed from these columns:
  - CHARACTER_SET_NAME
  - ID
  - IS_DEFAULT
  and from the corresponding columns in SHOW COLLATION.

  For example:

SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATIONS
WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci';
+-----------------------+--------------------+------+------------+
| COLLATION_NAME        | CHARACTER_SET_NAME | ID   | IS_DEFAULT |
+-----------------------+--------------------+------+------------+
| uca1400_swedish_as_ci | NULL               | NULL | NULL       |
+-----------------------+--------------------+------+------------+

  The NULL value in these columns now means that the collation
  is applicable to multiple character sets.
  The behavioir of old collations did not change.
  Make sure your client programs can handle NULL values in these columns.

- The structure of the table
  INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed.

  Three new NOT NULL columns were added:
  - FULL_COLLATION_NAME
  - ID
  - IS_DEFAULT

  New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY.
  The column COLLATION_NAME contains the collation name without the character
  set prefix. The column FULL_COLLATION_NAME contains the collation name with
  the character set prefix.

  Old collations have full collation name in both FULL_COLLATION_NAME and
  COLLATION_NAME.

SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT
FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4|latin1).*swedish.*ci$';
+-----------------------------+-------------------------------------+--------------------+------+------------+
| COLLATION_NAME              | FULL_COLLATION_NAME                 | CHARACTER_SET_NAME | ID   | IS_DEFAULT |
+-----------------------------+-------------------------------------+--------------------+------+------------+
| latin1_swedish_ci           | latin1_swedish_ci                   | latin1             |    8 | Yes        |
| latin1_swedish_nopad_ci     | latin1_swedish_nopad_ci             | latin1             | 1032 |            |
| utf8mb4_swedish_ci          | utf8mb4_swedish_ci                  | utf8mb4            |  232 |            |
| uca1400_swedish_ai_ci       | utf8mb4_uca1400_swedish_ai_ci       | utf8mb4            | 2368 |            |
| uca1400_swedish_as_ci       | utf8mb4_uca1400_swedish_as_ci       | utf8mb4            | 2370 |            |
| uca1400_swedish_nopad_ai_ci | utf8mb4_uca1400_swedish_nopad_ai_ci | utf8mb4            | 2372 |            |
| uca1400_swedish_nopad_as_ci | utf8mb4_uca1400_swedish_nopad_as_ci | utf8mb4            | 2374 |            |
+-----------------------------+-------------------------------------+--------------------+------+------------+

- Other INFORMATION_SCHEMA queries:

  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS;
  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS;
  SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES;
  SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA;
  SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS;
  SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS;
  SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS;

  display full collation names, including character sets prefix,
  for all collations, including new collations.

  Corresponding SHOW commands also display full collation names
  in collation related columns:

  SHOW CREATE TABLE t1;
  SHOW CREATE DATABASE db1;
  SHOW TABLE STATUS;
  SHOW CREATE FUNCTION f1;
  SHOW CREATE PROCEDURE p1;
  SHOW CREATE EVENT ev1;
  SHOW CREATE TRIGGER tr1;
  SHOW CREATE VIEW;

  These INFORMATION_SCHEMA queries and SHOW statements may change in
  the future, to display show collation names.
2022-08-10 15:04:24 +02:00
Rucha Deodhar
2fdb556e04 MDEV-8334: Rename utf8 to utf8mb3
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.
2021-05-19 06:48:36 +02:00
Vladislav Vaintroub
aa2ff62082 MDEV-9077 Use sys schema in bootstrapping, incl. mtr 2021-03-18 08:02:48 +01:00
Robert Bindar
042f5165e3 MDEV-307 review minor edits, add yacc_ora support 2019-05-21 09:33:30 +03:00
Alexander Barkov
0f8a1a314d MDEV-10877 xxx_unicode_nopad_ci collations 2016-09-23 14:19:07 +04:00
Alexander Barkov
a032fd5c1c MDEV-10758 engines/funcs.db_alter_collate_ascii and engines/funcs.db_alter_collate_utf8 fail with wrong results 2016-09-07 16:43:45 +04:00
Alexander Barkov
bc546225c0 Adding collations
utf8mb4_thai_520_w2, ucs2_thai_520_w2, utf16_thai_520_w2, utf32_thai_520_w2
2016-05-30 16:56:29 +04:00
Alexander Barkov
29db3b5e5c Clean-ups for MDEV-10132 utf8_thai_520_w2 collation:
- Changing strnxfrm_multiply from 8 to 4, as agreed with Pruet Boonma
- Adjusting tests
2016-05-26 22:56:28 +04:00
Alexander Barkov
a6e5ac2279 MDEV-4929 Myanmar collation 2013-12-20 12:42:33 +04:00
Alexander Barkov
fad7371191 Merging xxx_unicode_520_ci and xxx_vietnamese_ci from MySQL-5.6. 2013-11-12 16:48:57 +04:00
Michael Widenius
192678e7bf MDEV-5241: Collation incompatibilities with MySQL-5.6
- Character set code & tests from Alexander Barkov
- Integration with ALTER TABLE, REPAIR and open_table from Monty

The problem was that MySQL 5.6 added some croatian and vitanamese character set collations that are incompatible with MariaDB.

The fix is to move the MariaDB conflicting collation numbers out of the region that MySQL is likely to use.
mysql_upgrade, REPAIR TABLE or ALTER TABLE will fix the collations.
If one tries to access and old incompatible table, one will get the error "Table upgrade required...."
After this patch, MariaDB supports all the MySQL character set collations and the old MariaDB croatian collations, which are closer to the latest standard than the MySQL versions.

New character sets:
ucs2_croatian_mysql561_uca_ci
utf8_croatian_mysql561_uca_ci
utf16_croatian_mysql561_uca_ci
utf32_croatian_mysql561_uca_ci
utf8mb4_croatian_mysql561_uca_ci

Other things:
- Fixed some compiler warnings
- mysql_upgrade prints information about repaired tables.
- Increased version number

VERSION:
  Increased VERSION number
client/mysqlcheck.c:
  Print repaired table name when using --verbose
include/m_ctype.h:
  Add new MariaDB collation regions that are not likely to conflict with MySQL
include/my_base.h:
  Added flag to detect if table was opened for ALTER TABLE
mysql-test/r/ctype_ldml.result:
  Updated result
mysql-test/r/ctype_uca.result:
  Updated result
mysql-test/r/ctype_upgrade.result:
  Updated result
mysql-test/r/ctype_utf16_uca.result:
  Updated result
mysql-test/r/ctype_utf32_uca.result:
  Updated result
mysql-test/r/ctype_utf8mb4_uca.result:
  Updated result
mysql-test/std_data/ctype_upgrade:
  Test files for testing upgrading of conflicting collations
mysql-test/suite/engines/funcs/r/db_alter_collate_ascii.result:
  New collations added
mysql-test/suite/engines/funcs/r/db_alter_collate_utf8.result:
  New collations added
mysql-test/suite/innodb/r/innodb_ctype_ldml.result:
  Updated test result
mysql-test/suite/innodb/t/innodb_ctype_ldml.test:
  Updated test result
mysql-test/suite/plugins/r/show_all_plugins.result:
  Updated version number
mysql-test/suite/roles/create_and_drop_role_invalid_user_table.result:
  Updated version number
mysql-test/t/ctype_ldml.test:
  Updated test
mysql-test/t/ctype_uca.test:
  Testing of new collations
mysql-test/t/ctype_upgrade.test:
  Testing of upgrading tables with old collations
  The test ensures that:
  - We will get an error if we try to open a table with old collations.
  - CHECK TABLE will detect that the table needs to be upgraded.
  - ALTER TABLE and REPAIR will fix the table.
  - mysql_upgrade works as expected
mysql-test/t/ctype_utf16_uca.test:
  Testing of new collations
mysql-test/t/ctype_utf32_uca.test:
  Testing of new collations
mysql-test/t/ctype_utf8mb4_uca.test:
  Testing of new collations
mysys/charset-def.c:
  Added new character sets
mysys/charset.c:
  Always give an error, if requested, if a character set didn't exist
sql/handler.cc:
  - Added upgrade_collation() to check if collation is compatible with old version
  - check_collation_compatibility() checks if we are using an old collation from MariaDB 5.5 or MySQL 5.6
  - ha_check_for_upgrade() returns HA_ADMIN_NEEDS_ALTER if we have an incompatible collation
sql/handler.h:
  Added new prototypes
sql/sql_table.cc:
  - Mark that tables are opened for ALTER TABLE
  - If table needs to be upgraded, ensure we are not using online alter table.
sql/table.cc:
  - If we are using an old incompatible collation, change to use the new one and mark table as incompatible.
  - Give an error if we try to open an incompatible table.
sql/table.h:
  Added error that table needs to be rebuild
storage/connect/ha_connect.cc:
  Fixed compiler warning
strings/ctype-uca.c:
  New character sets
2013-11-09 00:20:07 +02:00
Alexander Barkov
3e1ae7a0be Recording correct test results:
mysql-test/suite/engines/funcs/r/db_alter_collate_ascii.result
  mysql-test/suite/engines/funcs/r/db_alter_collate_utf8.result
2013-11-06 17:55:22 +04:00
Elena Stepanova
f2b4305bd4 Merge 5.3->5.5 2012-08-02 04:22:43 +04:00
Elena Stepanova
d1a90e852b MDEV-369 (Mismatches in MySQL engines test suite)
Following reasons caused mismatches:
  - different handling of invalid values;
  - different CAST results with fractional seconds;
  - microseconds support in MariaDB;
  - different algorithm of comparing temporal values;
  - differences in error and warning texts and codes;
  - different approach to truncating datetime values to time;
  - additional collations;
  - different record order for queries without ORDER BY;
  - MySQL bug#66034.
More details in MDEV-369 comments.
2012-07-30 04:16:49 +04:00
Alexander Barkov
e6ce99e692 Merging from 5.1 2012-02-02 16:25:48 +04:00
Alexander Barkov
bbcc86f5d4 Postfix for Bug#11752408.
Recording correct test results.

modified:
  mysql-test/suite/engines/funcs/r/db_alter_collate_ascii.result
  mysql-test/suite/engines/funcs/r/db_alter_collate_utf8.result
2012-02-02 16:22:13 +04:00
Omer BarNir
8ac23986e7 Updates to test and result files in the 'engine' suites following changes in 5.5 2010-05-04 11:10:17 -07:00
Omer BarNir
c92b9b7315 Test suites for engine testing, moved from test-extra so will be available
for general use.


mysql-test/Makefile.am:
  Adding directories of additional test suites
mysql-test/mysql-stress-test.pl:
  Adding check for additional errors checking during test run
2010-03-17 23:42:07 -07:00