The task "MDEV-25829 Change default Unicode collation to uca1400_ai_ci"
previously changed collation derivation for string user variables
from DERIVATION_EXPLICIT to DERIVATION_COERCIBLE, to resolve illegal
collation mix conflicts between table columns and user variables
when they have different collations.
However, DERIVATION_COERCIBLE was a wrong choice because it caused
conflicts between string literals and user variables when they have
different collations.
Adding a new collation derivation level DERIVATION_USERVAR.
This makes the collation of a user variable:
- weaker than a table column (like it was intended by MDEV-25829)
- but stronger than a literal (like it was in pre-MDEV-25829)
Cleanup in sql_type.h:
Removing the line "- BINARY(expr)" from the before-DERIVATION_CAST
comment, as it was on a wrong place. It's also listed on the correct
place before DERIVATION_IMPLICIT.
Field_blob::store() has special code for GROUP_CONCAT temporary table
(to store blob values in Blob_mem_storage - this prevents them
from being freed/overwritten when a next row is read).
Field_geom and Field_blob_compressed inherit from Field_blob but they
have their own ::store() method without this special Blob_mem_storage
support.
Considering that non-grouping CONCAT() of such fields converts
them to plain BLOB, let's do the same for GROUP_CONCAT. To do it,
Item_func_group_concat::setup will signal that it's creating
a temporary table for GROUP_CONCAT, and Field_blog::make_new_field()
override will create base Field_blob when under group concat.
Step#2 - Adding a new collation derivation level for CAST and CONVERT.
Now character string cast functions:
- CAST(string_expr AS CHAR)
- CONVERT(expr USING charset_name)
have a new collation derivation level between:
- string literals
- utf8 metadata functions, e.g. user() and database()
Before the change these cast functions had collation derivation equal
to table columns, which caused more illegal mix of collation conflicts.
Note, binary string cast functions:
- BINARY(expr)
- CAST(string_expr AS BINARY)
- CONVERT(expr USING binary)
did not change their collation derivation, to preserve the behaviour of
queries like these:
SELECT database()=BINARY'test';
SELECT user()=CAST('root' AS BINARY);
SELECT current_role()=CONVERT('role' USING binary);
Derivation levels after the change look as follows:
DERIVATION_IGNORABLE= 7, // Explicit NULL
DERIVATION_NUMERIC= 6, // Numbers in string context,
// Numeric user variables
// CAST(numeric_expr AS CHAR)
DERIVATION_COERCIBLE= 5, // Literals, string user variables
DERIVATION_CAST= 4, // CAST(string_expr AS CHAR),
// CONVERT(string_expr USING cs)
DERIVATION_SYSCONST= 3, // utf8 metadata functions, e.g. user(), database()
DERIVATION_IMPLICIT= 2, // Table columns, SP variables, BINARY(expr)
DERIVATION_NONE= 1, // A mix (e.g. CONCAT) of two differrent collations
DERIVATION_EXPLICIT= 0 // An explicit COLLATE clause
It was possible to create virtual columns with incompatible
GENERATED ALWAYS expression data types:
CREATE TABLE t1 (a INT, b POINT GENERATED ALWAYS AS (a));
CREATE TABLE t1 (a POINT, b INT GENERATED ALWAYS AS (a));
These data type combinations are not allowed in other cases,
e.g. INSERT, UPDATE, SP variable assignment.
Fix:
Disallowing bad combinations of the column data type and its
GENERATED ALWAYS expression data type.
Fixing the problem that an operation involving a mix of
two or more GEOMETRY operands did not preserve their SRIDs.
Now SRIDs are preserved by hybrid functions, subqueries, TVCs, UNIONs, VIEWs.
Incompatible change:
An attempt to mix two different SRIDs now raises an error.
Details:
- Adding a new class Type_extra_attributes. It's a generic
container which can store very specific data type attributes.
For now it can store one uint32 and one const pointer attribute
(for GEOMETRY's SRID and for ENUM/SET TYPELIB respectively).
In the future it can grow as needed.
Type_extra_attributes will also be reused soon to store "const Type_zone*"
pointers for the TIMESTAMP's "WITH TIME ZONE 'tz'" attribute
(a timestamp data type with a fixed time zone independent from @@time_zone).
The time zone attribute will be stored in exactly the same way like
a TYPELIB pointer is stored by ENUM/SET.
- Removing Column_definition_attributes members "interval" and "srid".
Deriving Column_definition_attributes from the generic attribute container
Type_extra_attributes instead.
- Adding a new class Type_typelib_attributes, to store
the TYPELIB of the ENUM and SET data types. Deriving Field_enum from it.
Removing the member Field_enum::typelib.
- Adding a new class Type_geom_attributes, to store
the GEOMETRY related attributes. Deriving Field_geom from it.
Removing the member Field_geom::srid.
- Removing virtual methods:
Field::get_typelib()
Type_all_attributes::get_typelib() and
Type_all_attributes::set_typelib()
They were very specific to TYPELIB.
Adding more generic virtual methods instead:
* Field::type_extra_attributes() - to get extra attributes
* Type_all_attributes::type_extra_attributes() - to get extra attributes
* Type_all_attributes::type_extra_attributes_addr() - to set extra attributes
- Removing Item_type_holder::enum_set_typelib. Deriving Item_type_holder
from the generic attribute container Type_extra_attributes instead.
This makes it possible for UNION to preserve SRID
(in addition to preserving TYPELIB).
- Deriving Item_hybrid_func from Type_extra_attributes.
This makes it possible for hybrid functions (e.g. CASE, COALESCE,
LEAST, GREATEST etc) to preserve SRID.
- Deriving Item_singlerow_subselect from Type_extra_attributes and
overriding methods:
* Item_cache::type_extra_attributes()
* subselect_single_select_engine::fix_length_and_dec()
* Item_singlerow_subselect::type_extra_attributes()
* Item_singlerow_subselect::type_extra_attributes_addr()
This is needed to preserve SRID in subqueries and TVCs
- Cleanup: fixing the data type of members
* Binlog_type_info::m_enum_typelib
* Binlog_type_info::m_set_typelib
from "TYPELIB *" to "const TYPELIB *"
1. Store assignment failures on incompatible data types now raise errors if:
- STRICT_ALL_TABLES or STRICT_TRANS_TABLES sql_mode is used, and
- IGNORE is not used
Otherwise, only a warning is raised and the statement continues.
2. Changing the error/warning test as follows:
-ERROR HY000: Illegal parameter data types inet6 and int for operation 'SET'
+ERROR HY000: Cannot cast 'int' as 'inet6' in assignment of `db`.`t`.`col`
so in case of a big table it's easier to see which column has the problem.
The new error text is also applied to SP variables.
Now INSERT, UPDATE, ALTER statements involving incompatible data type pairs, e.g.:
UPDATE TABLE t1 SET col_inet6=col_int;
INSERT INTO t1 (col_inet6) SELECT col_in FROM t2;
ALTER TABLE t1 MODIFY col_inet6 INT;
consistently return an error at the statement preparation time:
ERROR HY000: Illegal parameter data types inet6 and int for operation 'SET'
and abort the statement before starting interating rows.
This error is the same with what is raised for queries like:
SELECT col_inet6 FROM t1 UNION SELECT col_int FROM t2;
SELECT COALESCE(col_inet6, col_int) FROM t1;
Before this change the error was caught only during the execution time,
when a Field_xxx::store_xxx() was called for the very firts row.
The behavior was not consistent between various statements and could do different things:
- abort the statement
- set a column to the data type default value (e.g. '::' for INET6)
- set a column to NULL
A typical old error was:
ERROR 22007: Incorrect inet6 value: '1' for column `test`.`t1`.`a` at row 1
EXCEPTION:
Note, there is an exception: a multi-row INSERT..VALUES, e.g.:
INSERT INTO t1 (col_a,col_b) VALUES (a1,b1),(a2,b2);
checks assignment compability at the preparation time for the very first row only:
(col_a,col_b) vs (a1,b1)
Other rows are still checked at the execution time and return the old warnings
or errors in case of a failure. This is done because catching all rows at the
preparation time would change behavior significantly. So it still works
according to the STRICT_XXX_TABLES sql_mode flags and the table transaction ability.
This is too late to change this behavior in 10.7.
There is no a firm decision yet if a multi-row INSERT..VALUES
behavior will change in later versions.
The 10.5 version of the patch.
Removing DEFAULT from INFORMATION_SCHEMA columns.
DEFAULT in read-only tables is rather meaningless.
Upgrade should go smoothly.
Also fixes:
MDEV-20254 Problems with EMPTY_STRING_IS_NULL and I_S tables
Removing DEFAULT from INFORMATION_SCHEMA columns.
DEFAULT in read-only tables is rather meaningless.
Upgrade should go smoothly.
Also fixes:
MDEV-20254 Problems with EMPTY_STRING_IS_NULL and I_S tables
This patch changes the main name of 3 byte character set from utf8 to
utf8mb3. New old_mode UTF8_IS_UTF8MB3 is added and set TRUE by default,
so that utf8 would mean utf8mb3. If not set, utf8 would mean utf8mb4.