Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
Partial commit of the greater MDEV-34348 scope.
MDEV-34348: MariaDB is violating clang-16 -Wcast-function-type-strict
The functions queue_compare, qsort2_cmp, and qsort_cmp2
all had similar interfaces, and were used interchangable
and unsafely cast to one another.
This patch consolidates the functions all into the
qsort_cmp2 interface.
Reviewed By:
============
Marko Mäkelä <marko.makela@mariadb.com>
LEAST() and GREATEST() erroneously calcucalted the result as signed
for BIGINT UNSIGNED arguments.
Adding a new method for unsigned arguments:
Item_func_min_max::val_uint_native()
The code erroneously called sec_since_epoch() for dates with zeros,
e.g. '2024-00-01'.
Fixi: adding a test that the date does not have zeros before
calling TIME_to_native().
A mixture of a multi-byte *TEXT column and a short binary column
produced a too large column.
For example, COALESCE(tinytext_utf8mb4, short_varbinary)
produced a BLOB column instead of an expected TINYBLOB.
- Adding a virtual method Type_all_attributes::character_octet_length(),
returning max_length by default.
- Overriding Item_field::character_octet_length() to extract
the octet length from the underlying Field.
- Overriding Item_ref::character_octet_length() to extract
the octet length from the references Item (e.g. as VIEW fields).
- Fixing Type_numeric_attributes::find_max_octet_length() to
take the octet length using the new method character_octet_length()
instead of accessing max_length directly.
Functions extracting non-negative datetime components:
- YEAR(dt), EXTRACT(YEAR FROM dt)
- QUARTER(td), EXTRACT(QUARTER FROM dt)
- MONTH(dt), EXTRACT(MONTH FROM dt)
- WEEK(dt), EXTRACT(WEEK FROM dt)
- HOUR(dt),
- MINUTE(dt),
- SECOND(dt),
- MICROSECOND(dt),
- DAYOFYEAR(dt)
- EXTRACT(YEAR_MONTH FROM dt)
did not set their max_length properly, so in the DECIMAL
context they created a too small DECIMAL column, which
led to the 'Out of range value' error.
The problem is that most of these functions historically
returned the signed INT data type.
There were two simple ways to fix these functions:
1. Add +1 to max_length.
But this would also change their size in the string context
and create too long VARCHAR columns, with +1 excessive size.
2. Preserve max_length, but change the data type from INT to INT UNSIGNED.
But this would break backward compatibility.
Also, using UNSIGNED is generally not desirable,
it's better to stay with signed when possible.
This fix implements another solution, which it makes all these functions
work well in all contexts: int, decimal, string.
Fix details:
- Adding a new special class Type_handler_long_ge0 - the data type
handler for expressions which:
* should look like normal signed INT
* but which known not to return negative values
Expressions handled by Type_handler_long_ge0 store in Item::max_length
only the number of digits, without adding +1 for the sign.
- Fixing Item_extract to use Type_handler_long_ge0
for non-negative datetime components:
YEAR, YEAR_MONTH, QUARTER, MONTH, WEEK
- Adding a new abstract class Item_long_ge0_func, for functions
returning non-negative datetime components.
Item_long_ge0_func uses Type_handler_long_ge0 as the type handler.
The class hierarchy now looks as follows:
Item_long_ge0_func
Item_long_func_date_field
Item_func_to_days
Item_func_dayofmonth
Item_func_dayofyear
Item_func_quarter
Item_func_year
Item_long_func_time_field
Item_func_hour
Item_func_minute
Item_func_second
Item_func_microsecond
- Cleanup: EXTRACT(QUARTER FROM dt) created an excessive VARCHAR column
in string context. Changing its length from 2 to 1.
During the 10.5->10.6 merge please use the 10.6 code on conflicts.
This is the 10.5 version of the patch (a backport of the 10.6 version).
Unlike 10.6 version, it makes changes in plugin/type_inet/sql_type_inet.*
rather than in sql/sql_type_fixedbin.h
Item_bool_rowready_func2, Item_func_between, Item_func_in
did not check if a not-NULL argument of an arbitrary data type
can produce a NULL value on conversion to INET6.
This caused a crash on DBUG_ASSERT() in conversion failures,
because the function returned SQL NULL for something that
has Item::maybe_null() equal to false.
Adding setting NULL-ability in such cases.
Details:
- Removing the code in Item_func::setup_args_and_comparator()
performing character set aggregation with optional narrowing.
This aggregation is done inside Arg_comparator::set_cmp_func_string().
So this code was redundant
- Removing Item_func::setup_args_and_comparator() as it git simplified to
just to two lines:
convert_const_compared_to_int_field(thd);
return cmp->set_cmp_func(thd, this, &args[0], &args[1], true);
Using these lines directly in:
- Item_bool_rowready_func2::fix_length_and_dec()
- Item_func_nullif::fix_length_and_dec()
- Adding a new virtual method:
- Type_handler::Item_bool_rowready_func2_fix_length_and_dec().
- Adding tests detecting if the data type conversion can return SQL NULL into
the following methods of Type_handler_inet6:
- Item_bool_rowready_func2_fix_length_and_dec
- Item_func_between_fix_length_and_dec
- Item_func_in_fix_comparator_compatible_types
When aggregating pairs BIT+NULL and NULL+BIT for result, e.g.
in COALESCE(), preserve the BIT data type (ignore explicit NULLs).
The same fix applied to YEAR.
Problem:
UNIX_TIMESTAMP() called for a expression of the TIME data type
returned NULL.
Inside Type_handler_timestamp_common::Item_val_native_with_conversion
the call for item->get_date() did not convert TIME to DATETIME
automatically (because it does not have to, by design).
As a result, Type_handler_timestamp_common::TIME_to_native() received
a MYSQL_TIME value with zero date 0000-00-00 and therefore returned "true"
(indicating SQL NULL value).
Fix:
Removing the call for item->get_date().
Instantiating Datetime(item) instead.
This forces automatic TIME to DATETIME conversion
(unless @@old_mode is zero_date_time_cast).
Precision should be kept below DECIMAL_MAX_SCALE for computations.
It can be bigger in Item_decimal. I'd fix this too but it changes the
existing behaviour so problemmatic to ix.
Hybrid functions (IF, COALESCE, etc) did not preserve the JSON property
from their arguments. The same problem was repeatable for single row subselects.
The problem happened because the method Item::is_json_type() was inconsistently
implemented across the Item hierarchy. For example, Item_hybrid_func
and Item_singlerow_subselect did not override is_json_type().
Solution:
- Removing Item::is_json_type()
- Implementing specific JSON type handlers:
Type_handler_string_json
Type_handler_varchar_json
Type_handler_tiny_blob_json
Type_handler_blob_json
Type_handler_medium_blob_json
Type_handler_long_blob_json
- Reusing the existing data type infrastructure to pass JSON
type handlers across all item types, including classes Item_hybrid_func
and Item_singlerow_subselect. Note, these two classes themselves do not
need any changes!
- Extending the data type infrastructure so data types can inherit
their properties (e.g. aggregation rules) from their base data types.
E.g. VARCHAR/JSON acts as VARCHAR, LONGTEXT/JSON acts as LONGTEXT
when mixed to a non-JSON data type. This is done by:
- adding virtual method Type_handler::type_handler_base()
- adding a helper class Type_handler_pair
- refactoring Type_handler_hybrid_field_type methods
aggregate_for_result(), aggregate_for_min_max(),
aggregate_for_num_op() to use Type_handler_pair.
This change also fixes:
MDEV-27361 Hybrid functions with JSON arguments do not send format metadata
Also, adding mtr tests for JSON replication. It was not covered yet.
And the current patch changes the replication code slightly.
Problem:
The problem happened because of a conceptual flaw in the server code:
a. The table level CHARSET/COLLATE clause affected all data types,
including numeric and temporal ones:
CREATE TABLE t1 (a INT) CHARACTER SET utf8 [COLLATE utf8_general_ci];
In the above example, the Column_definition_attributes
(and then the FRM record) for the column "a" erroneously inherited
"utf8" as its character set.
b. The "ALTER TABLE t1 CONVERT TO CHARACTER SET csname" statement
also erroneously affected Column_definition_attributes::charset
for numeric and temporal data types and wrote "csname" as their
character set into FRM files.
So now we have arbitrary non-relevant charset ID values for numeric
and temporal data types in all FRM files in the world :)
The code in the server and the other engines did not seem to be affected
by this flaw. Only InnoDB inplace ALTER was affected.
Solution:
Fixing the code in the way that only character string data types
(CHAR,VARCHAR,TEXT,ENUM,SET):
- inherit the table level CHARSET/COLLATE clause
- get the charset value according to "CONVERT TO CHARACTER SET csname".
Numeric and temporal data types now always get &my_charset_numeric
in Column_definition_attributes::charset and always write its ID into FRM files:
- no matter what the table level CHARSET/COLLATE clause is, and
- no matter what "CONVERT TO CHARACTER SET" says.
Details:
1. Adding helper classes to pass small parts of HA_CREATE_INFO
into Type_handler methods:
- Column_derived_attributes - to pass table level CHARSET/COLLATE,
so columns that do not have explicit CHARSET/COLLATE clauses
can derive them from the table level, e.g.
CREATE TABLE t1 (a VARCHAR(1), b CHAR(1)) CHARACTER SET utf8;
- Column_bulk_alter_attributes - to pass bulk attribute changes
generated by the ALTER related code. These bulk changes affect
multiple columns at the same time:
ALTER TABLE ... CONVERT TO CHARACTER SET csname;
Note, passing the whole HA_CREATE_INFO directly to Type_handler
would not be good: HA_CREATE_INFO is huge and would need not desired
dependencies in sql_type.h and sql_type.cc. The Type_handler API should
use smallest possible data types!
2. Type_handler::Column_definition_prepare_stage1() is now responsible
to set Column_definition::charset properly, according to the data type,
for example:
- For string data types, Column_definition_attributes::charset is set from
the table level CHARSET/COLLATE clause (if not specified explicitly in
the column definition).
- For numeric and temporal fields, Column_definition_attributes::charset is
set to &my_charset_numeric, no matter what the table level
CHARSET/COLLATE says.
- For GEOMETRY, Column_definition_attributes::charset is set to
&my_charset_bin, no matter what the table level CHARSET/COLLATE says.
Previously this code (setting `charset`) was outside of of
Column_definition_prepare_stage1(), namely in
mysql_prepare_create_table(), and was erroneously called for
all data types.
3. Adding Type_handler::Column_definition_bulk_alter(), to handle
"ALTER TABLE .. CONVERT TO". Previously this code was inside
get_sql_field_charset() and was erroneously called for all data types.
4. Removing the Schema_specification_st parameter from
Type_handler::Column_definition_redefine_stage1().
Column_definition_attributes::charset is now fully properly initialized by
Column_definition_prepare_stage1(). So we don't need access to the
table level CHARSET/COLLATE clause in Column_definition_redefine_stage1()
any more.
5. Other changes:
- Removing global function get_sql_field_charset()
- Moving the part of the former get_sql_field_charset(), which was
responsible to inherit the table level CHARSET/COLLATE clause to
new methods:
-- Column_definition_attributes::explicit_or_derived_charset() and
-- Column_definition::prepare_charset_for_string().
This code is only needed for string data types.
Previously it was erroneously called for all data types.
- Moving another part, which was responsible to apply the
"CONVERT TO" clause, to
Type_handler_general_purpose_string::Column_definition_bulk_alter().
- Replacing the call for get_sql_field_charset() in sql_partition.cc
to sql_field->explicit_or_derived_charset() - it is perfectly enough.
The old code was redundant: get_sql_field_charset() was called from
sql_partition.cc only when there were no a "CONVERT TO CHARACTER SET"
clause involved, so its purpose was only to inherit the table
level CHARSET/COLLATE clause.
- Moving the code handling the BINCMP_FLAG flag from
mysql_prepare_create_table() to
Column_definition::prepare_charset_for_string():
This code is responsible to resolve the BINARY comparison style
into the corresponding _bin collation, to do the following transparent
rewrite:
CREATE TABLE t1 (a VARCHAR(10) BINARY) CHARSET utf8; ->
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE utf8_bin);
This code is only needed for string data types.
Previously it was erroneously called for all data types.
6. Renaming Table_scope_and_contents_source_pod_st::table_charset
to alter_table_convert_to_charset, because the only purpose it's used for
is handlering "ALTER .. CONVERT". The new name is much more self-descriptive.
Allow materialization strategy when collations on the
inner and outer sides of an IN subquery are the same and the
character set of the inner side is a proper subset of the character
set on the outer side.
This allows conversion from utf8mb3 to utf8mb4
as the former is a subset of the later.
This is only allowed when IN predicate is converted to an IN subquery
Backported part of the patch (d6a00d9b18) of MDEV-17905.
It's a virtual method and it can't be inlined anyway. This allows type
plugins (mysql_json in particular) to use Type_handler_blob and / or
subclass it, without needing to explicitly expose the
vers_type_timestamp object.
Problem:
Queries like this showed performance degratation in 10.4 over 10.3:
SELECT temporal_literal FROM t1;
SELECT temporal_literal + 1 FROM t1;
SELECT COUNT(*) FROM t1 WHERE temporal_column = temporal_literal;
SELECT COUNT(*) FROM t1 WHERE temporal_column = string_literal;
Fix:
Replacing the universal member "MYSQL_TIME cached_time" in
Item_temporal_literal to data type specific containers:
- Date in Item_date_literal
- Time in Item_time_literal
- Datetime in Item_datetime_literal
This restores the performance, and make it even better in some cases.
See benchmark results in MDEV.
Also, this change makes futher separations of Date, Time, Datetime
from each other, which will make it possible not to derive them from
a too heavy (40 bytes) MYSQL_TIME, and replace them to smaller data
type specific containers.
Problem:
When calculatung MIN() and MAX() in a query with GROUP BY, like this:
SELECT MIN(time_expr), MAX(time_expr) FROM t1 GROUP BY i;
the code in Item_sum_min_max::update_field() erroneosly used
string format comparison, therefore '100:20:30' was considered as
smaller than '10:20:30'.
Fix:
1. Implementing low level "native" related methods in class Time:
Time::Time(const Native &native) - convert native to Time
Time::to_native(Native *to, uint decimals) - convert Time to native
The "native" binary representation for TIME is equal to
the binary data format of Field_timef, which is used to
store TIME when mysql56_temporal_format is ON (default).
2. Implementing Type_handler_time_common "native" related methods:
Type_handler_time_common::cmp_native()
Type_handler_time_common::Item_val_native_with_conversion()
Type_handler_time_common::Item_val_native_with_conversion_result()
Type_handler_time_common::Item_param_val_native()
3. Implementing missing "native representation" related methods
in Field_time and Field_timef:
Field_time::store_native()
Field_time::val_native()
Field_timef::store_native()
Field_timef::val_native()
4. Implementing missing "native" related methods in all Items
that can have the TIME data type:
Item_timefunc::val_native()
Item_name_const::val_native()
Item_time_literal::val_native()
Item_cache_time::val_native()
Item_handled_func::val_native()
5. Marking Type_handler_time_common as "native ready".
So now Item_sum_min_max::update_field() calculates
values using min_max_update_native_field(),
which uses native binary representation rather than string representation.
Before this change, only the TIMESTAMP data type used native
representation to calculate MIN() and MAX().
Benchmarks (see more details in MDEV):
This change not only fixes the wrong result, but also
makes a "SELECT .. MAX.. GROUP BY .." query faster:
# TIME(0)
CREATE TABLE t1 (id INT, time_col TIME) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
MySQL80: 0.159 sec
10.3: 0.108 sec
10.4: 0.094 sec (fixed)
# TIME(6):
CREATE TABLE t1 (id INT, time_col TIME(6)) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10.999999'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
My80: 0.154
10.3: 0.135
10.4: 0.093 (fixed)