mariadb

mirror of https://github.com/MariaDB/server.git synced 2025-11-10 23:02:54 +03:00

Author	SHA1	Message	Date
NTH19	80fbd0ee94	Remove redundant variable (#2237 )	2022-08-27 10:19:16 +01:00
Alexander Barkov	133446828c	MDEV-27009 Add UCA-14.0.0 collations - Added one neutral and 22 tailored (language specific) collations based on Unicode Collation Algorithm version 14.0.0. Collations were added for Unicode character sets utf8mb3, utf8mb4, ucs2, utf16, utf32. Every tailoring was added with four accent and case sensitivity flag combinations, e.g: * utf8mb4_uca1400_swedish_as_cs * utf8mb4_uca1400_swedish_as_ci * utf8mb4_uca1400_swedish_ai_cs * utf8mb4_uca1400_swedish_ai_ci and their _nopad_ variants: * utf8mb4_uca1400_swedish_nopad_as_cs * utf8mb4_uca1400_swedish_nopad_as_ci * utf8mb4_uca1400_swedish_nopad_ai_cs * utf8mb4_uca1400_swedish_nopad_ai_ci - Introducing a conception of contextually typed named collations: CREATE DATABASE db1 CHARACTER SET utf8mb4; CREATE TABLE db1.t1 (a CHAR(10) COLLATE uca1400_as_ci); The idea is that there is no a need to specify the character set prefix in the new collation names. It's enough to type just the suffix "uca1400_as_ci". The character set is taken from the context. In the above example script the context character set is utf8mb4. So the CREATE TABLE will make a column with the collation utf8mb4_uca1400_as_ci. Short collations names can be used in any parts of the SQL syntax where the COLLATE clause is understood. - New collations are displayed only one time (without character set combinations) by these statements: SELECT * FROM INFORMATION_SCHEMA.COLLATIONS; SHOW COLLATION; For example, all these collations: - utf8mb3_uca1400_swedish_as_ci - utf8mb4_uca1400_swedish_as_ci - ucs2_uca1400_swedish_as_ci - utf16_uca1400_swedish_as_ci - utf32_uca1400_swedish_as_ci have just one entry in INFORMATION_SCHEMA.COLLATIONS and SHOW COLLATION, with COLLATION_NAME equal to "uca1400_swedish_as_ci", which is the suffix without the character set name: SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci'; +-----------------------+ \| COLLATION_NAME \| +-----------------------+ \| uca1400_swedish_as_ci \| +-----------------------+ Note, the behaviour of old collations did not change. Non-unicode collations (e.g. latin1_swedish_ci) and old UCA-4.0.0 collations (e.g. utf8mb4_unicode_ci) are still displayed with the character set prefix, as before. - The structure of the table INFORMATION_SCHEMA.COLLATIONS was changed. The NOT NULL constraint was removed from these columns: - CHARACTER_SET_NAME - ID - IS_DEFAULT and from the corresponding columns in SHOW COLLATION. For example: SELECT COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE '%uca1400_swedish_as_ci'; +-----------------------+--------------------+------+------------+ \| COLLATION_NAME \| CHARACTER_SET_NAME \| ID \| IS_DEFAULT \| +-----------------------+--------------------+------+------------+ \| uca1400_swedish_as_ci \| NULL \| NULL \| NULL \| +-----------------------+--------------------+------+------------+ The NULL value in these columns now means that the collation is applicable to multiple character sets. The behavioir of old collations did not change. Make sure your client programs can handle NULL values in these columns. - The structure of the table INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY was changed. Three new NOT NULL columns were added: - FULL_COLLATION_NAME - ID - IS_DEFAULT New collations have multiple entries in COLLATION_CHARACTER_SET_APPLICABILITY. The column COLLATION_NAME contains the collation name without the character set prefix. The column FULL_COLLATION_NAME contains the collation name with the character set prefix. Old collations have full collation name in both FULL_COLLATION_NAME and COLLATION_NAME. SELECT COLLATION_NAME, FULL_COLLATION_NAME, CHARACTER_SET_NAME, ID, IS_DEFAULT FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY WHERE FULL_COLLATION_NAME RLIKE '^(utf8mb4\|latin1).swedish.ci$'; +-----------------------------+-------------------------------------+--------------------+------+------------+ \| COLLATION_NAME \| FULL_COLLATION_NAME \| CHARACTER_SET_NAME \| ID \| IS_DEFAULT \| +-----------------------------+-------------------------------------+--------------------+------+------------+ \| latin1_swedish_ci \| latin1_swedish_ci \| latin1 \| 8 \| Yes \| \| latin1_swedish_nopad_ci \| latin1_swedish_nopad_ci \| latin1 \| 1032 \| \| \| utf8mb4_swedish_ci \| utf8mb4_swedish_ci \| utf8mb4 \| 232 \| \| \| uca1400_swedish_ai_ci \| utf8mb4_uca1400_swedish_ai_ci \| utf8mb4 \| 2368 \| \| \| uca1400_swedish_as_ci \| utf8mb4_uca1400_swedish_as_ci \| utf8mb4 \| 2370 \| \| \| uca1400_swedish_nopad_ai_ci \| utf8mb4_uca1400_swedish_nopad_ai_ci \| utf8mb4 \| 2372 \| \| \| uca1400_swedish_nopad_as_ci \| utf8mb4_uca1400_swedish_nopad_as_ci \| utf8mb4 \| 2374 \| \| +-----------------------------+-------------------------------------+--------------------+------+------------+ - Other INFORMATION_SCHEMA queries: SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.PARAMETERS; SELECT TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES; SELECT DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA; SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.ROUTINES; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.EVENTS; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.EVENTS; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.ROUTINES; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.ROUTINES; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.TRIGGERS; SELECT DATABASE_COLLATION FROM INFORMATION_SCHEMA.TRIGGERS; SELECT COLLATION_CONNECTION FROM INFORMATION_SCHEMA.VIEWS; display full collation names, including character sets prefix, for all collations, including new collations. Corresponding SHOW commands also display full collation names in collation related columns: SHOW CREATE TABLE t1; SHOW CREATE DATABASE db1; SHOW TABLE STATUS; SHOW CREATE FUNCTION f1; SHOW CREATE PROCEDURE p1; SHOW CREATE EVENT ev1; SHOW CREATE TRIGGER tr1; SHOW CREATE VIEW; These INFORMATION_SCHEMA queries and SHOW statements may change in the future, to display show collation names.	2022-08-10 15:04:24 +02:00
Oleksandr Byelkin	f5c5f8e41e	Merge branch '10.5' into 10.6	2022-02-03 17:01:31 +01:00
Oleksandr Byelkin	cf63eecef4	Merge branch '10.4' into 10.5	2022-02-01 20:33:04 +01:00
Alexander Barkov	b915f79e4e	MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings	2022-01-21 12:16:07 +04:00
Alexander Barkov	0d68b0a2d6	MDEV-26669 Add MY_COLLATION_HANDLER functions min_str() and max_str()	2021-09-27 17:10:22 +04:00
Marko Mäkelä	80ed136e6d	Merge 10.4 into 10.5	2021-04-21 09:01:01 +03:00
Monty	031f11717d	Fix all warnings given by UBSAN The easiest way to compile and test the server with UBSAN is to run: ./BUILD/compile-pentium64-ubsan and then run mysql-test-run. After this commit, one should be able to run this without any UBSAN warnings. There is still a few compiler warnings that should be fixed at some point, but these do not expose any real bugs. The 'special' cases where we disable, suppress or circumvent UBSAN are: - ref10 source (as here we intentionally do some shifts that UBSAN complains about. - x86 version of optimized int#korr() methods. UBSAN do not like unaligned memory access of integers. Fixed by using byte_order_generic.h when compiling with UBSAN - We use smaller thread stack with ASAN and UBSAN, which forced me to disable a few tests that prints the thread stack size. - Verifying class types does not work for shared libraries. I added suppression in mysql-test-run.pl for this case. - Added '#ifdef WITH_UBSAN' when using integer arithmetic where it is safe to have overflows (two cases, in item_func.cc). Things fixed: - Don't left shift signed values (byte_order_generic.h, mysqltest.c, item_sum.cc and many more) - Don't assign not non existing values to enum variables. - Ensure that bool and enum values are properly initialized in constructors. This was needed as UBSAN checks that these types has correct values when one copies an object. (gcalc_tools.h, ha_partition.cc, item_sum.cc, partition_element.h ...) - Ensure we do not called handler functions on unallocated objects or deleted objects. (events.cc, sql_acl.cc). - Fixed bugs in Item_sp::Item_sp() where we did not call constructor on Query_arena object. - Fixed several cast of objects to an incompatible class! (Item.cc, Item_buff.cc, item_timefunc.cc, opt_subselect.cc, sql_acl.cc, sql_select.cc ...) - Ensure we do not do integer arithmetic that causes over or underflows. This includes also ++ and -- of integers. (Item_func.cc, Item_strfunc.cc, item_timefunc.cc, sql_base.cc ...) - Added JSON_VALUE_UNITIALIZED to json_value_types and ensure that value_type is initialized to this instead of to -1, which is not a valid enum value for json_value_types. - Ensure we do not call memcpy() when second argument could be null. - Fixed that Item_func_str::make_empty_result() creates an empty string instead of a null string (safer as it ensures we do not do arithmetic on null strings). Other things: - Changed struct st_position to an OBJECT and added an initialization function to it to ensure that we do not copy or use uninitialized members. The change to a class was also motived that we used "struct st_position" and POSITION randomly trough the code which was confusing. - Notably big rewrite in sql_acl.cc to avoid using deleted objects. - Changed in sql_partition to use '^' instead of '-'. This is safe as the operator is either 0 or 0x8000000000000000ULL. - Added check for select_nr < INT_MAX in JOIN::build_explain() to avoid bug when get_select() could return NULL. - Reordered elements in POSITION for better alignment. - Changed sql_test.cc::print_plan() to use pointers instead of objects. - Fixed bug in find_set() where could could execute '1 << -1'. - Added variable have_sanitizer, used by mtr. (This variable was before only in 10.5 and up). It can now have one of two values: ASAN or UBSAN. - Moved ~Archive_share() from ha_archive.cc to ha_archive.h and marked it virtual. This was an effort to get UBSAN to work with loaded storage engines. I kept the change as the new place is better. - Added in CONNECT engine COLBLK::SetName(), to get around a wrong cast in tabutil.cpp. - Added HAVE_REPLICATION around usage of rgi_slave, to get embedded server to compile with UBSAN. (Patch from Marko). - Added #ifdef for powerpc64 to avoid a bug in old gcc versions related to integer arithmetic. Changes that should not be needed but had to be done to suppress warnings from UBSAN: - Added static_cast<<uint16_t>> around shift to get rid of a LOT of compiler warnings when using UBSAN. - Had to change some '/' of 2 base integers to shift to get rid of some compile time warnings. Reviewed by: - Json changes: Alexey Botchkov - Charset changes in ctype-uca.c: Alexander Barkov - InnoDB changes & Embedded server: Marko Mäkelä - sql_acl.cc changes: Vicențiu Ciorbaru - build_explain() changes: Sergey Petrunia	2021-04-20 12:30:09 +03:00
Alexander Barkov	cfe5ee90c8	MDEV-22043 Special character leads to assertion in my_wc_to_printable_generic on 10.5.2 (debug) The code did not take into account that: - U+005C (backslash) can occupy more than mbminlen characters (e.g. in sjis) - Some character sets do not have a code for U+005C (e.g. swe7) Adding a new function my_wc_to_printable into MY_CHARSET_HANDLER to cover all special cases easier.	2020-05-09 16:01:30 +04:00
Marko Mäkelä	8b6cfda631	Merge 10.4 into 10.5	2020-02-07 08:51:20 +02:00
Monty	4d61f1247a	Fixed compiler warnings from gcc 7.4.1 - Fixed possible error in rocksdb/rdb_datadic.cc	2020-01-29 23:23:55 +02:00
Alexander Barkov	f1e13fdc8d	MDEV-21581 Helper functions and methods for CHARSET_INFO	2020-01-28 12:29:23 +04:00
Marko Mäkelä	5ab70e7f68	Merge 10.2 into 10.3	2019-12-27 15:14:48 +02:00
Marko Mäkelä	73985d8301	Merge 10.1 into 10.2	2019-12-23 07:14:51 +02:00
Alexander Barkov	3d98892232	Merge remote-tracking branch 'origin/5.5' into 10.1	2019-12-16 13:08:17 +04:00
Alexander Barkov	fc860d3fa3	MDEV-21065 UNIQUE constraint causes a query with string comparison to omit a row in the result set	2019-12-16 12:57:08 +04:00
Oleksandr Byelkin	55b2281a5d	Merge branch '10.2' into 10.3	2019-10-31 10:58:06 +01:00
Marko Mäkelä	19ceaf2928	Merge 10.1 into 10.2	2019-10-25 12:57:36 +03:00
Sergei Golubchik	790a74d22b	Merge branch 'github/5.5' into 10.1	2019-10-23 15:55:23 +02:00
Sergei Golubchik	719ac0ad4a	crash in string-to-int conversion using a specially crafted strings one could overflow `shift` variable and cause a crash by dereferencing d10[-2147483648] (on a sufficiently old gcc). This is a correct fix and a test case for Bug #29723340: MYSQL SERVER CRASH AFTER SQL QUERY WITH DATA ?AST	2019-10-19 11:48:38 +02:00
Marko Mäkelä	be85d3e61b	Merge 10.2 into 10.3	2019-05-14 17:18:46 +03:00
Marko Mäkelä	26a14ee130	Merge 10.1 into 10.2	2019-05-13 17:54:04 +03:00
Vicențiu Ciorbaru	cb248f8806	Merge branch '5.5' into 10.1	2019-05-11 22:19:05 +03:00
Vicențiu Ciorbaru	5543b75550	Update FSF Address * Update wrong zip-code	2019-05-11 21:29:06 +03:00
Marko Mäkelä	df563e0c03	Merge 10.2 into 10.3 main.derived_cond_pushdown: Move all 10.3 tests to the end, trim trailing white space, and add an "End of 10.3 tests" marker. Add --sorted_result to tests where the ordering is not deterministic. main.win_percentile: Add --sorted_result to tests where the ordering is no longer deterministic.	2018-11-06 09:40:39 +02:00
Marko Mäkelä	32062cc61c	Merge 10.1 into 10.2	2018-11-06 08:41:48 +02:00
Marko Mäkelä	d63e198061	Merge 10.0 into 10.1	2018-11-05 12:15:17 +02:00
Alexander Barkov	75ceb6ff13	MDEV-17298 ASAN unknown-crash / READ of size 1 in my_strntoul_8bit upon INSERT .. SELECT	2018-10-31 14:25:26 +04:00
Marko Mäkelä	05459706f2	Merge 10.2 into 10.3	2018-08-03 15:57:23 +03:00
Marko Mäkelä	ef3070e997	Merge 10.1 into 10.2	2018-08-02 08:19:57 +03:00
Oleksandr Byelkin	cb5952b506	Merge branch '10.0' into bb-10.1-merge-sanja	2018-07-25 22:24:40 +02:00
Alexander Barkov	e2ac4098ed	Simplify caseup() and casedn() in charsets After the MDEV-13118 fix there's no code in the server that wants caseup/casedn to change the argument in place for simple charsets. Let's remove this logic and always return the result in a new string for all charsets, both simple and complex. 1. Removing the optimization that some character sets used in casedn() and caseup(), which allowed (and required) to change the case in-place, overwriting the string passed as the "src" argument. Now all CHARSET_INFO's work in the same way: non of them change the source string in-place, all of them now convert case from the source string to the destination string, leaving the source string untouched. 2. Adding "const" qualifier to the "char src" parameter to caseup() and casedn(). 3. Removing duplicate implementations in ctype-mb.c. Now both caseup() and casedn() implementations for all CJK character sets use internally the same function my_casefold_mb() (the former my_casefold_mb_varlen()). 4. Removing the "unused" attribute from parameters of some my_case{up\|dn}_xxx() implementations, as the affected parameters are now used* in the code. Previously these parameters were used only in DBUG_ASSERT().	2018-07-19 13:02:14 +04:00
luz.paz	3dd01669b4	Misc. typos Found via `codespell -i 3 -w --skip="./debian/po" -I ../mariadb-server-word-whitelist.txt ./cmake/ ./debian/ ./Docs/ ./include/ ./man/ ./plugin/ ./strings/`	2018-04-05 15:26:57 +04:00
Vladislav Vaintroub	9891ee5a2a	Fix and reenable Windows compiler warning C4800 (size_t conversion).	2018-01-26 10:37:46 +00:00
Alexander Barkov	0e5eef886a	MDEV-14350 Index use with collation utf8mb4_unicode_nopad_ci on LIKE pattern with wrong results	2017-12-08 13:19:19 +04:00
Vladislav Vaintroub	7354dc6773	MDEV-13384 - misc Windows warnings fixed	2017-09-28 17:20:46 +00:00
Alexander Barkov	5058ced5df	MDEV-7769 MY_CHARSET_INFO refactoring# On branch 10.2 Part 3 (final): removing MY_CHARSET_HANDLER::well_formed_len().	2016-10-10 14:36:09 +04:00
Alexander Barkov	ee19806b8e	MDEV-9711 NO PAD collations Based on the patch from Daniil Medvedev (a Google Summer of Code task)	2016-09-06 12:50:02 +04:00
Alexander Barkov	e4f6fd5e12	MDEV-10743 LDML: a new syntax to reuse sort order from another 8bit simple collation	2016-09-06 12:37:11 +04:00
Alexander Barkov	1ca595fbf7	LDML refactoring for "MDEV-9711 NO PAD collations" - Moving detection of the MY_CS_CSSORT, MY_CS_PUREASCII, MY_CS_NONASCII flags of loadable collations from add_collation() in mysys.c to my_cset_init_8bit() and my_coll_init_simple() in ctype-simple.c. - Adding tests that these flags are set properly for loadable collations - Moving LDML test related .xml files from mysql-test/std_data/ to mysql-test/std_data/ldml/, as there will be more .xml test files	2016-09-03 09:05:56 +04:00
Alexander Barkov	e7ff281d2e	MDEV-6353 my_ismbchar() and my_mbcharlen() refactoring	2016-05-17 15:27:10 +04:00
Alexander Barkov	1d73005bf3	MDEV-8360 Clean-up CHARSET_INFO: strnncollsp: diff_if_only_endspace_difference - Removing the "diff_if_only_endspace_difference" argument from MY_COLLATION_HANDLER::strnncollsp(), my_strnncollsp_simple(), as well as in the function template MY_FUNCTION_NAME(strnncollsp) in strcoll.ic - Removing the "diff_if_only_space_different" from ha_compare_text(), hp_rec_key_cmp(). - Adding a new function my_strnncollsp_padspace_bin() and reusing it instead of duplicate code pieces in my_strnncollsp_8bit_bin(), my_strnncollsp_latin1_de(), my_strnncollsp_tis620(), my_strnncollsp_utf8_cs(). - Adding more tests for better coverage of the trailing space handling. - Removing the unused definition of HA_END_SPACE_ARE_EQUAL	2016-03-31 11:04:48 +04:00
Alexander Barkov	e09299511e	MDEV-9665 Remove cs->cset->ismbchar() Using a more powerfull cs->cset->charlen() instead.	2016-03-16 10:55:12 +04:00
Alexander Barkov	d9b25ae3db	MDEV-8466 CAST works differently for DECIMAL/INT vs DOUBLE for empty strings MDEV-8468 CAST and INSERT work differently for DECIMAL/INT vs DOUBLE for a string with trailing spaces	2015-09-17 11:05:07 +04:00
Alexander Barkov	78b80cb6ba	Adding MY_CHARSET_HANDLER::native_to_mb(). This is a pre-requisite patch for: - MDEV-8433 Make field<'broken-string' use indexes - MDEV-8625 Bad result set with ignorable characters when using a prefix key - MDEV-8626 Bad result set with contractions when using a prefix key	2015-08-14 18:34:41 +04:00
Alexander Barkov	75931feabe	MDEV-8362 dash '-' is not recognized in charset armscii8 on select where query	2015-07-14 12:00:05 +04:00
Alexander Barkov	197afb413f	MDEV-6566 Different INSERT behaviour on bad bytes with and without character set conversion	2015-03-13 16:51:36 +04:00
Alexander Barkov	b1b6101af2	A preparatory patch for MDEV-6566. Adding a new virtual function MY_CHARSET_HANDLER::copy_abort(). Moving character set specific code into the correspoding implementations (for simple, multi-byte and mbmaxlen>1 character sets).	2015-03-02 18:24:22 +04:00
Alexander Barkov	807934d083	MDEV-7086 main.ctype_cp932 fails in buildbot on a valgrind build Removing a redundant and wrong condition which could access beyond the pattern string range.	2014-11-18 13:07:37 +04:00
Michael Widenius	c4f5326bb7	MDEV-6255 DUPLICATE KEY Errors on SELECT .. GROUP BY that uses temporary and filesort. The problem was that my_hash_sort didn't properly delete end-space characters properly, so strings that should compare identically was seen as different strings. (Space was handled correctly, but not NBSP) This caused duplicate key errors when a heap table was converted to Aria as part of overflow in group by. Fixed by removing all characters that compares as end space when creating a hash. Other things: - Fixed that --sorted_results also works for errors in mysqltest. - Speed up hash by not comparing strings that has different hash. - Speed up many my_hash_sort functions by using registers to calculate hash instead of pointers. This was previously done for some functions, but not for all. - Made a macro of the hash function, to simplify code and to be able to experiment with new hash functions. client/mysqltest.cc: Fixed that --sorted_results also works for error messages. mysql-test/r/ctype_partitions.result: New test to ensure that partitions on hash works mysql-test/suite/multi_source/gtid.result: Updated result mysql-test/suite/multi_source/gtid.test: Test that --sorted_result works for error messages mysql-test/suite/multi_source/gtid_ignore_duplicates.result: Updated result mysql-test/suite/multi_source/gtid_ignore_duplicates.test: Updated result mysql-test/suite/multi_source/load_data.result: Updated result mysql-test/suite/multi_source/load_data.test: Updated result mysql-test/t/ctype_partitions.test: New test to ensure that partitions on hash works storage/heap/hp_write.c: Speed up hash by not comparing strings that has different hash. storage/maria/ma_check.c: Extra debug strings/ctype-bin.c: Use macro for hash function strings/ctype-latin1.c: Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-mb.c: Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-simple.c: Use macro for hash function Use same variable names as in other my_hash_sort functions. Update my_hash_sort_simple() to properly remove end space (patch by Bar) strings/ctype-uca.c: Ignore duplicated space inside strings and end space in my_hash_sort_uca(). This fixed MDEV-6255 Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-ucs2.c: Use macro for hash function Use registers to calculate hash (speedup) strings/ctype-utf8.c: Use macro for hash function Use registers to calculate hash (speedup) strings/strings_def.h: Made a macro of the hash function, to simplify code and to be able to experiment with new hash functions.	2014-09-11 22:42:35 +03:00

1 2 3 4 5

201 Commits