This patch adds a way to override default collations
(or "character set collations") for desired character sets.
The SQL standard says:
> Each collation known in an SQL-environment is applicable to one
> or more character sets, and for each character set, one or more
> collations are applicable to it, one of which is associated with
> it as its character set collation.
In MariaDB, character set collations has been hard-coded so far,
e.g. utf8mb4_general_ci has been a hard-coded character set collation
for utf8mb4.
This patch allows to override (globally per server, or per session)
character set collations, so for example, uca1400_ai_ci can be set as a
character set collation for Unicode character sets
(instead of compiled xxx_general_ci).
The array of overridden character set collations is stored in a new
(session and global) system variable @@character_set_collations and
can be set as a comma separated list of charset=collation pairs, e.g.:
SET @@character_set_collations='utf8mb3=uca1400_ai_ci,utf8mb4=uca1400_ai_ci';
The variable is empty by default, which mean use the hard-coded
character set collations (e.g. utf8mb4_general_ci for utf8mb4).
The variable can also be set globally by passing to the server startup command
line, and/or in my.cnf.
Problem:
UNIX_TIMESTAMP() called for a expression of the TIME data type
returned NULL.
Inside Type_handler_timestamp_common::Item_val_native_with_conversion
the call for item->get_date() did not convert TIME to DATETIME
automatically (because it does not have to, by design).
As a result, Type_handler_timestamp_common::TIME_to_native() received
a MYSQL_TIME value with zero date 0000-00-00 and therefore returned "true"
(indicating SQL NULL value).
Fix:
Removing the call for item->get_date().
Instantiating Datetime(item) instead.
This forces automatic TIME to DATETIME conversion
(unless @@old_mode is zero_date_time_cast).
Type_handler::partition_field_append_value() erroneously
passed the address of my_collation_contextually_typed_binary
to conversion functions copy_and_convert() and my_convert().
This happened because generate_partition_syntax_for_frm()
was called from mysql_create_frm_image() in the stage when
the fields in List<Create_field> can still contain unresolved
contextual collations, like "binary" in the reported crash scenario:
ALTER TABLE t CHANGE COLUMN a a CHAR BINARY;
Fix:
1. Splitting mysql_prepare_create_table() into two parts:
- mysql_prepare_create_table_stage1() interates through
List<Create_field> and calls Create_field::prepare_stage1(),
which performs basic attribute initialization, including
context collation resolution.
- mysql_prepare_create_table_finalize() - the rest of the
old mysql_prepare_create_table() code.
2. Changing mysql_create_frm_image():
It now calls:
- mysql_prepare_create_table_stage1() in the very
beginning, before the partition related code.
- mysql_prepare_create_table_finalize() in the end,
instead of the old mysql_prepare_create_table() call
3. Adding mysql_prepare_create_table() as a wrapper
for two calls:
mysql_prepare_create_table_stage1() ||
mysql_prepare_create_table_finalize()
so the code stays unchanged in the other places
where mysql_prepare_create_table() was used.
4. Changing prototype for Type_handler::Column_definition_prepare_stage1()
Removing arguments:
- handler *file
- ulonglong table_flags
Adding a new argument instead:
- column_definition_type_t type
This allows to call Column_definition_prepare_stage1() and
therefore to call mysql_prepare_create_table_stage1()
before instantiation of a handler.
This simplifies the code, because in case of a partitioned table,
mysql_create_frm_image() creates a handler of the underlying
partition first, the frees it and created a ha_partition
instance instead.
mysql_prepare_create_table() before the fix was called with the final
(ha_partition) handler.
5. Moving parts of Column_definition_prepare_stage1() which
need a pointer to handler and table_flags to
Column_definition_prepare_stage2().
- Adding data type aliases:
using Lex_column_charset_collation_attrs_st = Lex_charset_collation_st;
using Lex_column_charset_collation_attrs = Lex_charset_collation;
and using them all around the code (except lex_charset.*)
instead of the original names.
- Renaming Lex_field_type_st::lex_charset_collation()
to charset_collation_attrs()
- Renaming Column_definition::set_lex_charset_collation()
to set_charset_collation_attrs()
- Renaming Column_definition::lex_charset_collation()
to charset_collation_attrs()
Rationale:
The name "Lex_charset_collation" was a not very good name.
It does not tell details about its properties:
1. if the charset is optional (yes)
2. if the collation is optional (yes)
3. if the charset can be exact (yes) or context (no)
4. if the collation can be: exact (yes) or context (yes)
5. if the clauses can be repeated multiple times (yes)
We'll need a few new data types soon with different properties.
For example, to fix MDEV-27896 and MDEV-27782, we'll need a new
data type which is very like Lex_charset_collation, but additionally
supports CHARACTER SET DEFAULT (which is allowed on table and database level,
but is not allowed on the column level yet), i.e. with:
"the charset can be exact (yes) or context (yes)" in N3.
So we'll have to rename Lex_charset_collation to something else,
e.g.: Lex_exact_charset_extended_collation_attrs,
and add a new data type:
e.g. Lex_extended_charset_extended_collation_attrs
Also, we'll possibly allow CHARACTER SET DEFAULT at the column level for
consistency with other places. So the storge on the column level can change:
- from Lex_exact_charset_extended_collation_attrs
- to Lex_extended_charset_extended_collation_attrs
Adding the aliases introduces a convenient abstraction against
upcoming renames and c++ data type changes.
Precision should be kept below DECIMAL_MAX_SCALE for computations.
It can be bigger in Item_decimal. I'd fix this too but it changes the
existing behaviour so problemmatic to ix.
This patch also fixes:
MDEV-27690 Crash on `CHARACTER SET csname COLLATE DEFAULT` in column definition
MDEV-27853 Wrong data type on column `COLLATE DEFAULT` and table `COLLATE some_non_default_collation`
MDEV-28067 Multiple conflicting column COLLATE clauses are not rejected
MDEV-28118 Wrong collation of `CAST(.. AS CHAR COLLATE DEFAULT)`
MDEV-28119 Wrong column collation on MODIFY + CONVERT
Hybrid functions (IF, COALESCE, etc) did not preserve the JSON property
from their arguments. The same problem was repeatable for single row subselects.
The problem happened because the method Item::is_json_type() was inconsistently
implemented across the Item hierarchy. For example, Item_hybrid_func
and Item_singlerow_subselect did not override is_json_type().
Solution:
- Removing Item::is_json_type()
- Implementing specific JSON type handlers:
Type_handler_string_json
Type_handler_varchar_json
Type_handler_tiny_blob_json
Type_handler_blob_json
Type_handler_medium_blob_json
Type_handler_long_blob_json
- Reusing the existing data type infrastructure to pass JSON
type handlers across all item types, including classes Item_hybrid_func
and Item_singlerow_subselect. Note, these two classes themselves do not
need any changes!
- Extending the data type infrastructure so data types can inherit
their properties (e.g. aggregation rules) from their base data types.
E.g. VARCHAR/JSON acts as VARCHAR, LONGTEXT/JSON acts as LONGTEXT
when mixed to a non-JSON data type. This is done by:
- adding virtual method Type_handler::type_handler_base()
- adding a helper class Type_handler_pair
- refactoring Type_handler_hybrid_field_type methods
aggregate_for_result(), aggregate_for_min_max(),
aggregate_for_num_op() to use Type_handler_pair.
This change also fixes:
MDEV-27361 Hybrid functions with JSON arguments do not send format metadata
Also, adding mtr tests for JSON replication. It was not covered yet.
And the current patch changes the replication code slightly.
The changes to galera.galear_var_replicate_myisam_on
in commit d9b933bec6061758c5d7b34f55afcae32a85c110
are omitted due to conflicts
with commit 27d66d644cf2ebe9201e0362f2050036cce2908a.
This change removed 68 explict strlen() calls from the code.
The following renames was done to ensure we don't use the old names
when merging code from earlier releases, as using the new variables
for print function could result in crashes:
- charset->csname renamed to charset->cs_name
- charset->name renamed to charset->coll_name
Almost everything where mechanical changes except:
- Changed to use the new Protocol::store(LEX_CSTRING..) when possible
- Changed to use field->store(LEX_CSTRING*, CHARSET_INFO*) when possible
- Changed to use String->append(LEX_CSTRING&) when possible
Other things:
- There where compiler issues with ensuring that all character set names
points to the same string: gcc doesn't allow one to use integer constants
when defining global structures (constant char * pointers works fine).
To get around this, I declared defines for each character set name
length.
Changes:
- To detect automatic strlen() I removed the methods in String that
uses 'const char *' without a length:
- String::append(const char*)
- Binary_string(const char *str)
- String(const char *str, CHARSET_INFO *cs)
- append_for_single_quote(const char *)
All usage of append(const char*) is changed to either use
String::append(char), String::append(const char*, size_t length) or
String::append(LEX_CSTRING)
- Added STRING_WITH_LEN() around constant string arguments to
String::append()
- Added overflow argument to escape_string_for_mysql() and
escape_quotes_for_mysql() instead of returning (size_t) -1 on overflow.
This was needed as most usage of the above functions never tested the
result for -1 and would have given wrong results or crashes in case
of overflows.
- Added Item_func_or_sum::func_name_cstring(), which returns LEX_CSTRING.
Changed all Item_func::func_name()'s to func_name_cstring()'s.
The old Item_func_or_sum::func_name() is now an inline function that
returns func_name_cstring().str.
- Changed Item::mode_name() and Item::func_name_ext() to return
LEX_CSTRING.
- Changed for some functions the name argument from const char * to
to const LEX_CSTRING &:
- Item::Item_func_fix_attributes()
- Item::check_type_...()
- Type_std_attributes::agg_item_collations()
- Type_std_attributes::agg_item_set_converter()
- Type_std_attributes::agg_arg_charsets...()
- Type_handler_hybrid_field_type::aggregate_for_result()
- Type_handler_geometry::check_type_geom_or_binary()
- Type_handler::Item_func_or_sum_illegal_param()
- Predicant_to_list_comparator::add_value_skip_null()
- Predicant_to_list_comparator::add_value()
- cmp_item_row::prepare_comparators()
- cmp_item_row::aggregate_row_elements_for_comparison()
- Cursor_ref::print_func()
- Removes String_space() as it was only used in one cases and that
could be simplified to not use String_space(), thanks to the fixed
my_vsnprintf().
- Added some const LEX_CSTRING's for common strings:
- NULL_clex_str, DATA_clex_str, INDEX_clex_str.
- Changed primary_key_name to a LEX_CSTRING
- Renamed String::set_quick() to String::set_buffer_if_not_allocated() to
clarify what the function really does.
- Rename of protocol function:
bool store(const char *from, CHARSET_INFO *cs) to
bool store_string_or_null(const char *from, CHARSET_INFO *cs).
This was done to both clarify the difference between this 'store' function
and also to make it easier to find unoptimal usage of store() calls.
- Added Protocol::store(const LEX_CSTRING*, CHARSET_INFO*)
- Changed some 'const char*' arrays to instead be of type LEX_CSTRING.
- class Item_func_units now used LEX_CSTRING for name.
Other things:
- Fixed a bug in mysql.cc:construct_prompt() where a wrong escape character
in the prompt would cause some part of the prompt to be duplicated.
- Fixed a lot of instances where the length of the argument to
append is known or easily obtain but was not used.
- Removed some not needed 'virtual' definition for functions that was
inherited from the parent. I added override to these.
- Fixed Ordered_key::print() to preallocate needed buffer. Old code could
case memory overruns.
- Simplified some loops when adding char * to a String with delimiters.