mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-08-07 00:04:31 +03:00

Author	SHA1	Message	Date
Oleksandr Byelkin	f1102da37a	Merge branch '11.8' into 12.0	2025-05-22 09:22:55 +02:00
Sergei Golubchik	237e24497b	Merge remote-tracking branch 'github/bb-11.4-release' into bb-11.8-serg	2025-04-27 19:40:00 +02:00
Alexander Barkov	10c063f9f0	MDEV-36213 Doubled memory usage (11.4.4 <-> 11.4.5) Fixing the code adding MySQL _0900_ collations as _uca1400_ aliases not to perform deep initialization of the corresponding _uca1400_ collations. Only basic initialization is now performed which allows to watch these collations (both _0900_ and _uca1400_) in queries to INFORMATION_SCHEMA tables COLLATIONS and COLLATION_CHARACTER_SET_APPLICABILITY, as well as in SHOW COLLATION statements. Deep initialization is now performed only when a collation (either the _0900_ alias or the corresponding _uca1400_ collation) is used for the very first time after the server startup. Refactoring was done to maintain the code easier: - most of the _uca1400_ code was moved from ctype-uca.c to a new file ctype-uca1400.c - most of the _0900_ code was moved from type-uca.c to a new file ctype-uca0900.c Change details: - The original function add_alias_for_collation() added by the patch for "MDEV-20912 Add support for utf8mb4_0900_* collations in MariaDB Server" was removed from mysys/charset.c, as it had two two problems: a. it forced deep initialization of the _uca1400_ collations when adding _0900_ aliases for them at the server startup (the main reported problem) b. the collation initialization code in add_alias_for_collation() was related more to collations rather than to memory management, so /strings should be a better place for it than /mysys. The code from add_alias_for_collation() was split into separate functions. Cyclic dependency was removed. `#include <my_sys.h>` was removed from /strings/ctype-uca.c. Collations are now added using a callback function MY_CHARSET_LOADED::add_collation, like it is done for user collations defined in Index.xml. The code in /mysys sets MY_CHARSET_LOADED::add_collation to add_compiled_collation(). - The function compare_collations() was removed. A new virtual function was added into my_collation_handler_st instead: my_bool (eq_collation)(CHARSET_INFO self, CHARSET_INFO other); because it is the collation handler who knows how to detect equal collations by comparing only some of CHARSET_INFO members without their deep initialization. Three implementations were added: - my_ci_eq_collation_uca() for UCA collations, it compares _0900_ collations as equal to their corresponding _uca1400_ collations. - my_ci_eq_collation_utf8mb4_bin(), it compares utf8mb4_nopad_bin and utf8mb4_0900_bin as equal. - my_ci_eq_collation_generic() - the default implementation, which compares all collations as not equal. A C++ wrapper CHARSET_INFO::eq_collations() was added. The code in /sql was changes to use the wrapper instead of the former calls for the removed function compare_collations(). - A part of add_alias_for_collation() was moved into a new function my_ci_alloc(). It allocates a memory for a new charset_info_st instance together with the collation name and the comment using a single MY_CHARSET_LOADER::once_alloc call, which points to my_once_alloc() in the server. - A part of add_alias_for_collation() was moved into a new function my_ci_make_comment_for_alias(). It makes an "Alias for xxx" string, e.g. "Alias for utf8mb4_uca1400_swedish_ai_ci" in case of utf8mb4_sv_0900_ai_ci. - A part of the code in create_tailoring() was moved to a new function my_uca1400_collation_get_initialized_shared_uca(), to reuse the code between _uca1400_ and _0900_ collations. - A new function my_collation_id_is_mysql_uca0900() was added in addition to my_collation_id_is_mysql_uca1400(). - Functions to build collation names were added: my_uca0900_collation_build_name() my_uca1400_collation_build_name() - A shared function function was added: my_bool my_uca1400_collation_alloc_and_init(MY_CHARSET_LOADER loader, LEX_CSTRING name, LEX_CSTRING comment, const uca_collation_def_param_t *param, uint id) It's reused to add _uca1400_ and _0900_ collations, with basic initialization (without deep initialization). - The function add_compiled_collation() changed its return type from void to int, to make it compatible with MY_CHARSET_LOADER::add_collation. - Functions mysql_uca0900_collation_definition_add(), mysql_uca0900_utf8mb4_collation_definitions_add(), mysql_utf8mb4_0900_bin_add() were added into ctype-uca0900.c. They get MY_CHARSET_LOADER as a parameter. - Functions my_uca1400_collation_definition_add(), my_uca1400_collation_definitions_add() were moved from charset-def.c to strings/ctype-uca1400.c. The latter now accepts MY_CHARSET_LOADER as the first parameter instead of initializing a MY_CHARSET_LOADER inside. - init_compiled_charsets() now initializes a MY_CHARSET_LOADER variable and passes it to all functions adding collations: - mysql_utf8mb4_0900_collation_definitions_add() - mysql_uca0900_utf8mb4_collation_definitions_add() - mysql_utf8mb4_0900_bin_add() - A new structure was added into ctype-uca.h: typedef struct uca_collation_def_param { my_cs_encoding_t cs_id; uint tailoring_id; uint nopad_flags; uint level_flags; } uca_collation_def_param_t; It simplifies reusing the code for _uca1400_ and _0900_ collations. - The definition of MY_UCA1400_COLLATION_DEFINITION was moved from ctype-uca.c to ctype-uca1400.h, to reuse the code for _uca1400_ and _0900_ collations. - The definitions of "MY_UCA_INFO my_uca_v1400" and "MY_UCA_INFO my_uca1400_info_tailored[][]" were moved from ctype-uca.c to ctype-uca1400.c. - The definitions/declarations of: - mysql_0900_collation_start, - struct mysql_0900_to_mariadb_1400_mapping - mysql_0900_to_mariadb_1400_mapping - mysql_utf8mb4_0900_collation_definitions_add() were moved from ctype-uca.c to ctype-uca0900.c - Functions my_uca1400_make_builtin_collation_id() my_uca1400_collation_definition_init() my_uca1400_collation_id_uca400_compat() my_ci_get_collation_name_uca1400_context() were moved from ctype-uca.c to ctype-uca1400.c and ctype-uca1400.h - A part of my_uca1400_collation_definition_init() was moved into my_uca0520_builtin_collation_by_id(), to make functions smaller.	2025-04-17 10:01:53 +04:00
Sergey Vojtovich	c3f21762e9	Corrections to parent "speedup collation" commit Rather than populating collation_name_hash in a separate loop, call my_hash_insert() from appropriate methods.	2025-03-18 18:40:43 +04:00
Jitesh Chawla	543ebbcf8e	MDEV-35876 - speedup collation/charset lookup Replaces O(n) linear scans for collation lookups with O(1) hash lookups to eliminate performance bottlenecks as collation counts grow.	2025-03-18 18:40:43 +04:00
ParadoxV5	63b0ee26f7	Tag ALL `my_error_reporter`s with `ATTRIBUTE_FORMAT` The function pointer typedef `my_error_reporter` is already tagged. This commit inherits this attribute to all `my_getopt_error_reporter`s and `my_charset_error_reporter`s for consistency. (It future-proofs for deliberate direct uses of those functions.)	2025-02-12 10:17:44 +01:00
Sergei Golubchik	9ee09a33bb	Merge branch '11.7' into 11.8	2025-02-11 20:29:43 +01:00
Sergei Golubchik	ba01c2aaf0	Merge branch '11.4' into 11.7 * rpl.rpl_system_versioning_partitions updated for MDEV-32188 * innodb.row_size_error_log_warnings_3 changed error for MDEV-33658 (checks are done in a different order)	2025-02-06 16:46:36 +01:00
Alexander Barkov	89f5d28191	MDEV-22217 Make OS character sets "utf8" and "utf-8" map to MariaDB character set "utf8mb4" Map Unix utf8 locales to utf8mb4 instead of utf8mb3.	2025-01-22 11:45:32 +04:00
Monty	653f68784a	MDEV-35865 atomic.alter_table times out often The problem was that get_collation_number_internal() loops over all collations for finding a collation based on name. For looking up utf8mb4_0900_ aliases it used 22633 character strings comparisons at startup. Fixed by adding the MariaDB internal collation number in the "0900" alias lookup array. This is fine as collation numbers never changes. Discussed-with: serg@mariadb.com	2025-01-18 10:41:43 +02:00
Marko Mäkelä	15700f54c2	Merge 11.4 into 11.7	2025-01-09 09:41:38 +02:00
Monty	7fcaab7aaa	MDEV-20912 Add support for utf8mb4_0900_* collations in MariaDB Server This is done by mapping most of the existing MySQL unicode 0900 collations to MariadB 1400 unicode collations. The assumption is that 1400 is a super set of 0900 for all practical purposes. I also added a new function 'compare_collations()' and changed most code to use this instead of comparing character sets directly. This enables one to seamlessly mix-and-match the corresponding 0900 and 1400 sets. Field comparision and alter table treats the character sets as identical. All MySQL 8.0 0900 collations are supported except: - utf8mb4_ja_0900_as_cs - utf8mb4_ja_0900_as_cs_ks - utf8mb4_ru_0900_as_cs - utf8mb4_zh_0900_as_cs These do not have corresponding entries in the MariadB 01400 collations. Other things: - Added COMMENT colum to information_schema.collations. For utf8mb4_0900 colletions it contains the corresponding alias collation.	2024-12-28 10:23:49 +02:00
Marko Mäkelä	33907f9ec6	Merge 11.4 into 11.7	2024-12-02 17:51:17 +02:00
Marko Mäkelä	3d23adb766	Merge 10.6 into 10.11	2024-11-29 13:43:17 +02:00
Marko Mäkelä	7d4077cc11	Merge 10.5 into 10.6	2024-11-29 12:37:46 +02:00
Brandon Nesterenko	840fe316d4	MDEV-34348: my_hash_get_key fixes Partial commit of the greater MDEV-34348 scope. MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict Change the type of my_hash_get_key to: 1) Return const 2) Change the context parameter to be const void* Also fix casting in hash adjacent areas. Reviewed By: ============ Marko Mäkelä <marko.makela@mariadb.com>	2024-11-23 08:14:22 -07:00
Alexander Barkov	fd247cc21f	MDEV-31340 Remove MY_COLLATION_HANDLER::strcasecmp() This patch also fixes: MDEV-33050 Build-in schemas like oracle_schema are accent insensitive MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not work well with lower-case-table-names=0 MDEV-33085 Tables T1 and t1 do not work well with ENGINE=CSV and lower-case-table-names=0 MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case insensitive with lower-case-table-names=0 MDEV-33088 Cannot create triggers in the database `MYSQL` MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not case sensitive with lower-case-table-names=0 MDEV-33109 DROP DATABASE MYSQL -- does not drop SP with lower-case-table-names=0 MDEV-33110 HANDLER commands are case insensitive with lower-case-table-names=0 MDEV-33119 User is case insensitive in INFORMATION_SCHEMA.VIEWS MDEV-33120 System log table names are case insensitive with lower-cast-table-names=0 - Removing the virtual function strnncoll() from MY_COLLATION_HANDLER - Adding a wrapper function CHARSET_INFO::streq(), to compare two strings for equality. For now it calls strnncoll() internally. In the future it will turn into a virtual function. - Adding new accent sensitive case insensitive collations: - utf8mb4_general1400_as_ci - utf8mb3_general1400_as_ci They implement accent sensitive case insensitive comparison. The weight of a character is equal to the code point of its upper case variant. These collations use Unicode-14.0.0 casefolding data. The result of my_charset_utf8mb3_general1400_as_ci.strcoll() is very close to the former my_charset_utf8mb3_general_ci.strcasecmp() There is only a difference in a couple dozen rare characters, because: - the switch from "tolower" to "toupper" comparison, to make utf8mb3_general1400_as_ci closer to utf8mb3_general_ci - the switch from Unicode-3.0.0 to Unicode-14.0.0 This difference should be tolarable. See the list of affected characters in the MDEV description. Note, utf8mb4_general1400_as_ci correctly handles non-BMP characters! Unlike utf8mb4_general_ci, it does not treat all BMP characters as equal. - Adding classes representing names of the file based database objects: Lex_ident_db Lex_ident_table Lex_ident_trigger Their comparison collation depends on the underlying file system case sensitivity and on --lower-case-table-names and can be either my_charset_bin or my_charset_utf8mb3_general1400_as_ci. - Adding classes representing names of other database objects, whose names have case insensitive comparison style, using my_charset_utf8mb3_general1400_as_ci: Lex_ident_column Lex_ident_sys_var Lex_ident_user_var Lex_ident_sp_var Lex_ident_ps Lex_ident_i_s_table Lex_ident_window Lex_ident_func Lex_ident_partition Lex_ident_with_element Lex_ident_rpl_filter Lex_ident_master_info Lex_ident_host Lex_ident_locale Lex_ident_plugin Lex_ident_engine Lex_ident_server Lex_ident_savepoint Lex_ident_charset engine_option_value::Name - All the mentioned Lex_ident_xxx classes implement a method streq(): if (ident1.streq(ident2)) do_equal(); This method works as a wrapper for CHARSET_INFO::streq(). - Changing a lot of "LEX_CSTRING name" to "Lex_ident_xxx name" in class members and in function/method parameters. - Replacing all calls like system_charset_info->coll->strcasecmp(ident1, ident2) to ident1.streq(ident2) - Taking advantage of the c++11 user defined literal operator for LEX_CSTRING (see m_strings.h) and Lex_ident_xxx (see lex_ident.h) data types. Use example: const Lex_ident_column primary_key_name= "PRIMARY"_Lex_ident_column; is now a shorter version of: const Lex_ident_column primary_key_name= Lex_ident_column({STRING_WITH_LEN("PRIMARY")});	2024-04-18 15:22:10 +04:00
Alexander Barkov	7f6b648d7d	MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations for utf8 String length growth during upper/lower conversion in Unicode collations depends only on the underlying MY_UNICASE_INFO used in the collation. Maintaining a separate member CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply duplicated this information and caused bugs like this (when MY_UNICASE_INFO and case??_multiply when out of sync because of incomplete CHARSET_INFO initialization). Fix: Changing CHARSET_INFO::caseup_multiply and CHARSET_INFO::casedn_multiply from members to virtual functions. The virtual functions in Unicode collations calculate case conversion growth factors from the MY_UNICASE_INFO. This guarantees that the growth factors are always in sync with the MY_UNICASE_INFO.	2023-02-17 17:33:27 +04:00
Marko Mäkelä	345356b868	Merge 10.9 into 10.10	2023-02-16 11:36:38 +02:00
Marko Mäkelä	dbab3e8d90	Merge 10.6 into 10.8	2023-02-10 13:43:53 +02:00
Marko Mäkelä	6aec87544c	Merge 10.5 into 10.6	2023-02-10 13:03:01 +02:00
Marko Mäkelä	c41c79650a	Merge 10.4 into 10.5	2023-02-10 12:02:11 +02:00
Alexander Barkov	0845bce0d9	MDEV-30556 UPPER() returns an empty string for U+0251 in Unicode-5.2.0+ collations for utf8	2023-02-03 18:18:32 +04:00
Alexander Barkov	133446828c	MDEV-27009 Add UCA-14.0.0 collations - Added one neutral and 22 tailored (language specific) collations based on Unicode Collation Algorithm version 14.0.0. Collations were added for Unicode character sets utf8mb3, utf8mb4, ucs2, utf16, utf32. Every tailoring was added with four accent and case sensitivity flag combinations, e.g: * utf8mb4_uca1400_swedish_as_cs * utf8mb4_uca1400_swedish_as_ci * utf8mb4_uca1400_swedish_ai_cs * utf8mb4_uca1400_swedish_ai_ci and their _nopad_ variants: * utf8mb4_uca1400_swedish_nopad_as_cs * utf8mb4_uca1400_swedish_nopad_as_ci * utf8mb4_uca1400_swedish_nopad_ai_cs * utf8mb4_uca1400_swedish_nopad_ai_ci - Introducing a conception of contextually typed named collations: CREATE DATABASE db1 CHARACTER SET utf8mb4; CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci); The idea is that there is no a need to specify the character set prefix in the new collation names. It's enough to type just the suffix "uca1400_as_ci". The character set is taken from the context. In the above example script the context character set is utf8mb4. So the CREATE TABLE will make a column with the collation utf8mb4_uca1400_as_ci. Short collations names can be used in any parts of the SQL syntax where the COLLATE clause is understood. - New collations are displayed only one time (without character set combinations) by these statements: SELECT * FROM INFORMATION_SCHEMA.COLLATIONS; SHOW COLLATION; For example, all these collations: - utf8mb3_uca1400_swedish_as_ci - utf8mb4_uca1400_swedish_as_ci - ucs2_uca1400_swedish_as_ci - utf16_uca1400_swedish_as_ci - utf32_uca1400_swedish_as_ci have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION, with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix without the character set name: SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci'; +-----------------------+ \| COLLATION_NAME \| +-----------------------+ \| uca1400_swedish_as_ci \| +-----------------------+ Note, the behaviour of old collations did not change. Non-unicode collations (e.g. latin1_swedish_ci) and old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci) are still displayed with the character set prefix, as before. - The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed. The NOT NULL constraint was removed from these columns: - CHARACTER_SET_NAME - ID - IS_DEFAULT and from the corresponding columns in SHOW COLLATION. For example: SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci'; +-----------------------+--------------------+------+------------+ \| COLLATION_NAME \| CHARACTER_SET_NAME \| ID \| IS_DEFAULT \| +-----------------------+--------------------+------+------------+ \| uca1400_swedish_as_ci \| NULL \| NULL \| NULL \| +-----------------------+--------------------+------+------------+ The NULL value in these columns now means that the collation is applicable to multiple character sets. The behavioir of old collations did not change. Make sure your client programs can handle NULL values in these columns. - The structure of the table INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed. Three new NOT NULL columns were added: - FULL_COLLATION_NAME - ID - IS_DEFAULT New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY. The column COLLATION_NAME contains the collation name without the character set prefix. The column FULL_COLLATION_NAME contains the collation name with the character set prefix. Old collations have full collation name in both FULL_COLLATION_NAME and COLLATION_NAME. SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4\|latin1).swedish.ci$'; +-----------------------------+-------------------------------------+--------------------+------+------------+ \| COLLATION_NAME \| FULL_COLLATION_NAME \| CHARACTER_SET_NAME \| ID \| IS_DEFAULT \| +-----------------------------+-------------------------------------+--------------------+------+------------+ \| latin1_swedish_ci \| latin1_swedish_ci \| latin1 \| 8 \| Yes \| \| latin1_swedish_nopad_ci \| latin1_swedish_nopad_ci \| latin1 \| 1032 \| \| \| utf8mb4_swedish_ci \| utf8mb4_swedish_ci \| utf8mb4 \| 232 \| \| \| uca1400_swedish_ai_ci \| utf8mb4_uca1400_swedish_ai_ci \| utf8mb4 \| 2368 \| \| \| uca1400_swedish_as_ci \| utf8mb4_uca1400_swedish_as_ci \| utf8mb4 \| 2370 \| \| \| uca1400_swedish_nopad_ai_ci \| utf8mb4_uca1400_swedish_nopad_ai_ci \| utf8mb4 \| 2372 \| \| \| uca1400_swedish_nopad_as_ci \| utf8mb4_uca1400_swedish_nopad_as_ci \| utf8mb4 \| 2374 \| \| +-----------------------------+-------------------------------------+--------------------+------+------------+ - Other INFORMATION_SCHEMA queries: SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS; SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES; SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS; display full collation names, including character sets prefix, for all collations, including new collations. Corresponding SHOW commands also display full collation names in collation related columns: SHOW CREATE TABLE t1; SHOW CREATE DATABASE db1; SHOW TABLE STATUS; SHOW CREATE FUNCTION f1; SHOW CREATE PROCEDURE p1; SHOW CREATE EVENT ev1; SHOW CREATE TRIGGER tr1; SHOW CREATE VIEW; These INFORMATION_SCHEMA queries and SHOW statements may change in the future, to display show collation names.	2022-08-10 15:04:24 +02:00
Vladislav Vaintroub	9ea83f7fbd	MDEV-26713 set console codepage to what user set in --default-character-set If someone on whatever reasons uses --default-character-set=cp850, this will avoid incorrect display, and inserting incorrect data. Adjusting console codepage sometimes also needs to happen with --default-charset=auto, on older Windows. This is because autodetection is not always exact. For example, console codepage on US editions of Windows is 437. Client autodetects it as cp850, a rather loose approximation, given 46 code point differences. We change the console codepage to cp850, so that there is no discrepancy. That fix is currently Windows-only, and serves people who used combination of chcp to achieve WYSIWYG effect (although, this would mostly likely used with utf8 in the past) Now, --default-character-set would be a replacement for that. Fix fs_character_set() detection of current codepage.	2021-12-15 19:13:57 +01:00
Vladislav Vaintroub	a4fc41b6b4	MDEV-26713 Treat codepage 65001 as utf8mb4, not utf8mb3 Also, fix the "UTF8" option in MSI, which is responsible for character-set-server setting	2021-12-15 19:13:57 +01:00
Vladislav Vaintroub	ba9d231b5a	MDEV-26713 Set activeCodePage=UTF8 for windows programs - Use corresponding entry in the manifest, as described in https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page - If if ANSI codepage is UTF8 (i.e for Windows 1903 and later) Use UTF8 as default client charset Set console codepage(s) to UTF8, in case process is using console - Allow some previously disabled MTR tests, that used Unicode for in "exec", for the recent Windows versions	2021-12-15 19:13:57 +01:00
Vladislav Vaintroub	0102732686	Revert "MDEV-26713 Windows - improve utf8 support for command line tools" This reverts commit several commits pushed by mistake.	2021-11-19 09:46:57 +01:00
Vladislav Vaintroub	012d3cecb8	MDEV-26713 Windows - improve utf8 support for command line tools	2021-11-18 17:25:40 +01:00
Monty	a206658b98	Change CHARSET_INFO character set and collaction names to LEX_CSTRING This change removed 68 explict strlen() calls from the code. The following renames was done to ensure we don't use the old names when merging code from earlier releases, as using the new variables for print function could result in crashes: - charset->csname renamed to charset->cs_name - charset->name renamed to charset->coll_name Almost everything where mechanical changes except: - Changed to use the new Protocol::store(LEX_CSTRING..) when possible - Changed to use field->store(LEX_CSTRING, CHARSET_INFO) when possible - Changed to use String->append(LEX_CSTRING&) when possible Other things: - There where compiler issues with ensuring that all character set names points to the same string: gcc doesn't allow one to use integer constants when defining global structures (constant char * pointers works fine). To get around this, I declared defines for each character set name length.	2021-05-19 22:54:07 +02:00
Monty	b6ff139aa3	Reduce usage of strlen() Changes: - To detect automatic strlen() I removed the methods in String that uses 'const char ' without a length: - String::append(const char) - Binary_string(const char str) - String(const char str, CHARSET_INFO cs) - append_for_single_quote(const char ) All usage of append(const char) is changed to either use String::append(char), String::append(const char, size_t length) or String::append(LEX_CSTRING) - Added STRING_WITH_LEN() around constant string arguments to String::append() - Added overflow argument to escape_string_for_mysql() and escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow. This was needed as most usage of the above functions never tested the result for -1 and would have given wrong results or crashes in case of overflows. - Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING. Changed all Item_func::func_name()'s to func_name_cstring()'s. The old Item_func_or_sum::func_name() is now an inline function that returns func_name_cstring().str. - Changed Item::mode_name() and Item::func_name_ext() to return LEX_CSTRING. - Changed for some functions the name argument from const char * to to const LEX_CSTRING &: - Item::Item_func_fix_attributes() - Item::check_type_...() - Type_std_attributes::agg_item_collations() - Type_std_attributes::agg_item_set_converter() - Type_std_attributes::agg_arg_charsets...() - Type_handler_hybrid_field_type::aggregate_for_result() - Type_handler_geometry::check_type_geom_or_binary() - Type_handler::Item_func_or_sum_illegal_param() - Predicant_to_list_comparator::add_value_skip_null() - Predicant_to_list_comparator::add_value() - cmp_item_row::prepare_comparators() - cmp_item_row::aggregate_row_elements_for_comparison() - Cursor_ref::print_func() - Removes String_space() as it was only used in one cases and that could be simplified to not use String_space(), thanks to the fixed my_vsnprintf(). - Added some const LEX_CSTRING's for common strings: - NULL_clex_str, DATA_clex_str, INDEX_clex_str. - Changed primary_key_name to a LEX_CSTRING - Renamed String::set_quick() to String::set_buffer_if_not_allocated() to clarify what the function really does. - Rename of protocol function: bool store(const char from, CHARSET_INFO cs) to bool store_string_or_null(const char from, CHARSET_INFO cs). This was done to both clarify the difference between this 'store' function and also to make it easier to find unoptimal usage of store() calls. - Added Protocol::store(const LEX_CSTRING, CHARSET_INFO) - Changed some 'const char' arrays to instead be of type LEX_CSTRING. - class Item_func_units now used LEX_CSTRING for name. Other things: - Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character in the prompt would cause some part of the prompt to be duplicated. - Fixed a lot of instances where the length of the argument to append is known or easily obtain but was not used. - Removed some not needed 'virtual' definition for functions that was inherited from the parent. I added override to these. - Fixed Ordered_key::print() to preallocate needed buffer. Old code could case memory overruns. - Simplified some loops when adding char to a String with delimiters.	2021-05-19 22:27:48 +02:00
Rucha Deodhar	2fdb556e04	MDEV-8334: Rename utf8 to utf8mb3 This patch changes the main name of 3 byte character set from utf8 to utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default, so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.	2021-05-19 06:48:36 +02:00
Monty	dbcd3384e0	MDEV-7947 strcmp() takes 0.37% in OLTP RO This patch ensures that all identical character sets shares the same cs->csname. This allows us to replace strcmp() in my_charset_same() with comparisons of pointers. This fixes a long standing performance issue that could cause as strcmp() for every item sent trough the protocol class to the end user. One consequence of this patch is that we don't allow one to add a character definition in the Index.xml file that changes the csname of an existing character set. This is by design as changing character set names of existing ones is extremely dangerous, especially as some storage engines just records character set numbers. As we now have a hash over character set's csname, we can in the future use that for faster access to a specific character set. This could be done by changing the hash to non unique and use the hash to find the next character set with same csname.	2020-07-23 10:54:33 +03:00
Sergei Golubchik	7c58e97bf6	perfschema memory related instrumentation changes	2020-03-10 19:24:22 +01:00
Alexander Barkov	f1e13fdc8d	MDEV-21581 Helper functions and methods for CHARSET_INFO	2020-01-28 12:29:23 +04:00
Alexander Barkov	3e7e87ddcc	MDEV-19897 Rename source code variable names from utf8 to utf8mb3	2019-06-28 12:37:04 +04:00
Vladislav Vaintroub	5804bb4ef0	MDEV-19750 mysql command wrong encoding Restore the detection of default charset in command line utilities. It worked up to 10.1, but was broken by Connector/C. Moved code for detection of default charset from sql-common/client.c to mysys, and make command line utilities to use this code if charset was not specified on the command line.	2019-06-17 18:04:47 +01:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	cb248f8806	Merge branch '5.5' into 10.1	2019-05-11 22:19:05 +03:00
Vicențiu Ciorbaru	5543b75550	Update FSF Address * Update wrong zip-code	2019-05-11 21:29:06 +03:00
Alexander Barkov	0259b3cbbe	MDEV-11255 LDML: allow defining 2-level UCA collations	2016-11-08 20:57:19 +04:00
Alexander Barkov	0f8a1a314d	MDEV-10877 xxx_unicode_nopad_ci collations	2016-09-23 14:19:07 +04:00
Alexander Barkov	ee19806b8e	MDEV-9711 NO PAD collations Based on the patch from Daniil Medvedev (a Google Summer of Code task)	2016-09-06 12:50:02 +04:00
Alexander Barkov	e4f6fd5e12	MDEV-10743 LDML: a new syntax to reuse sort order from another 8bit simple collation	2016-09-06 12:37:11 +04:00
Alexander Barkov	1ca595fbf7	LDML refactoring for "MDEV-9711 NO PAD collations" - Moving detection of the MY_CS_CSSORT, MY_CS_PUREASCII, MY_CS_NONASCII flags of loadable collations from add_collation() in mysys.c to my_cset_init_8bit() and my_coll_init_simple() in ctype-simple.c. - Adding tests that these flags are set properly for loadable collations - Moving LDML test related .xml files from mysql-test/std_data/ to mysql-test/std_data/ldml/, as there will be more .xml test files	2016-09-03 09:05:56 +04:00
Alexander Barkov	e7ff281d2e	MDEV-6353 my_ismbchar() and my_mbcharlen() refactoring	2016-05-17 15:27:10 +04:00
Alexander Barkov	3fc6a8b832	MDEV-9811 LOAD DATA INFILE does not work well with gbk in some cases MDEV-9824 LOAD DATA does not work with multi-byte strings in LINES TERMINATED BY when IGNORE is specified	2016-03-31 14:22:25 +04:00
Alexander Barkov	22a64047d1	MDEV-6274 Collation usage statistics Adding collation usage statistics into the feedback plugin I_S table.	2014-08-11 05:45:45 +04:00
Michael Widenius	192678e7bf	MDEV-5241: Collation incompatibilities with MySQL-5.6 - Character set code & tests from Alexander Barkov - Integration with ALTER TABLE, REPAIR and open_table from Monty The problem was that MySQL 5.6 added some croatian and vitanamese character set collations that are incompatible with MariaDB. The fix is to move the MariaDB conflicting collation numbers out of the region that MySQL is likely to use. mysql_upgrade, REPAIR TABLE or ALTER TABLE will fix the collations. If one tries to access and old incompatible table, one will get the error "Table upgrade required...." After this patch, MariaDB supports all the MySQL character set collations and the old MariaDB croatian collations, which are closer to the latest standard than the MySQL versions. New character sets: ucs2_croatian_mysql561_uca_ci utf8_croatian_mysql561_uca_ci utf16_croatian_mysql561_uca_ci utf32_croatian_mysql561_uca_ci utf8mb4_croatian_mysql561_uca_ci Other things: - Fixed some compiler warnings - mysql_upgrade prints information about repaired tables. - Increased version number VERSION: Increased VERSION number client/mysqlcheck.c: Print repaired table name when using --verbose include/m_ctype.h: Add new MariaDB collation regions that are not likely to conflict with MySQL include/my_base.h: Added flag to detect if table was opened for ALTER TABLE mysql-test/r/ctype_ldml.result: Updated result mysql-test/r/ctype_uca.result: Updated result mysql-test/r/ctype_upgrade.result: Updated result mysql-test/r/ctype_utf16_uca.result: Updated result mysql-test/r/ctype_utf32_uca.result: Updated result mysql-test/r/ctype_utf8mb4_uca.result: Updated result mysql-test/std_data/ctype_upgrade: Test files for testing upgrading of conflicting collations mysql-test/suite/engines/funcs/r/db_alter_collate_ascii.result: New collations added mysql-test/suite/engines/funcs/r/db_alter_collate_utf8.result: New collations added mysql-test/suite/innodb/r/innodb_ctype_ldml.result: Updated test result mysql-test/suite/innodb/t/innodb_ctype_ldml.test: Updated test result mysql-test/suite/plugins/r/show_all_plugins.result: Updated version number mysql-test/suite/roles/create_and_drop_role_invalid_user_table.result: Updated version number mysql-test/t/ctype_ldml.test: Updated test mysql-test/t/ctype_uca.test: Testing of new collations mysql-test/t/ctype_upgrade.test: Testing of upgrading tables with old collations The test ensures that: - We will get an error if we try to open a table with old collations. - CHECK TABLE will detect that the table needs to be upgraded. - ALTER TABLE and REPAIR will fix the table. - mysql_upgrade works as expected mysql-test/t/ctype_utf16_uca.test: Testing of new collations mysql-test/t/ctype_utf32_uca.test: Testing of new collations mysql-test/t/ctype_utf8mb4_uca.test: Testing of new collations mysys/charset-def.c: Added new character sets mysys/charset.c: Always give an error, if requested, if a character set didn't exist sql/handler.cc: - Added upgrade_collation() to check if collation is compatible with old version - check_collation_compatibility() checks if we are using an old collation from MariaDB 5.5 or MySQL 5.6 - ha_check_for_upgrade() returns HA_ADMIN_NEEDS_ALTER if we have an incompatible collation sql/handler.h: Added new prototypes sql/sql_table.cc: - Mark that tables are opened for ALTER TABLE - If table needs to be upgraded, ensure we are not using online alter table. sql/table.cc: - If we are using an old incompatible collation, change to use the new one and mark table as incompatible. - Give an error if we try to open an incompatible table. sql/table.h: Added error that table needs to be rebuild storage/connect/ha_connect.cc: Fixed compiler warning strings/ctype-uca.c: New character sets	2013-11-09 00:20:07 +02:00
Alexander Barkov	426d246f5b	MDEV-5163 Merge WEIGHT_STRING function from MySQL-5.6	2013-10-23 20:25:52 +04:00

1 2 3 4 5 ...

256 Commits