Allow materialization strategy when collations on the
inner and outer sides of an IN subquery are the same and the
character set of the inner side is a proper subset of the character
set on the outer side.
This allows conversion from utf8mb3 to utf8mb4
as the former is a subset of the later.
This is only allowed when IN predicate is converted to an IN subquery
Backported part of the patch (d6a00d9b18f) of MDEV-17905.
The issue here was the system variable max_sort_length was being applied
to decimals and it was truncating the value for decimals to the number
of bytes set by max_sort_length.
This was leading to a buffer overflow as the values were written
to the buffer without truncation and then we moved the offset to
the number of bytes(set by max_sort_length), that are needed for comparison.
The fix is to not apply max_sort_length for fixed size types like INT,
DECIMALS and only apply max_sort_length for CHAR, VARCHARS, TEXT and
BLOBS.
It's a virtual method and it can't be inlined anyway. This allows type
plugins (mysql_json in particular) to use Type_handler_blob and / or
subclass it, without needing to explicitly expose the
vers_type_timestamp object.
Problem:
Queries like this showed performance degratation in 10.4 over 10.3:
SELECT temporal_literal FROM t1;
SELECT temporal_literal + 1 FROM t1;
SELECT COUNT(*) FROM t1 WHERE temporal_column = temporal_literal;
SELECT COUNT(*) FROM t1 WHERE temporal_column = string_literal;
Fix:
Replacing the universal member "MYSQL_TIME cached_time" in
Item_temporal_literal to data type specific containers:
- Date in Item_date_literal
- Time in Item_time_literal
- Datetime in Item_datetime_literal
This restores the performance, and make it even better in some cases.
See benchmark results in MDEV.
Also, this change makes futher separations of Date, Time, Datetime
from each other, which will make it possible not to derive them from
a too heavy (40 bytes) MYSQL_TIME, and replace them to smaller data
type specific containers.
Implementing methods:
- Field::val_time_packed()
- Field::val_datetime_packed()
- Item_field::val_datetime_packed(THD *thd);
- Item_field::val_time_packed(THD *thd);
to give a faster access to temporal packed longlong representation of a Field,
which is used in temporal Arg_comparator's to DATE, TIME, DATETIME data types.
The same idea is used in MySQL-5.6+.
This improves performance.
Problem:
When calculatung MIN() and MAX() in a query with GROUP BY, like this:
SELECT MIN(time_expr), MAX(time_expr) FROM t1 GROUP BY i;
the code in Item_sum_min_max::update_field() erroneosly used
string format comparison, therefore '100:20:30' was considered as
smaller than '10:20:30'.
Fix:
1. Implementing low level "native" related methods in class Time:
Time::Time(const Native &native) - convert native to Time
Time::to_native(Native *to, uint decimals) - convert Time to native
The "native" binary representation for TIME is equal to
the binary data format of Field_timef, which is used to
store TIME when mysql56_temporal_format is ON (default).
2. Implementing Type_handler_time_common "native" related methods:
Type_handler_time_common::cmp_native()
Type_handler_time_common::Item_val_native_with_conversion()
Type_handler_time_common::Item_val_native_with_conversion_result()
Type_handler_time_common::Item_param_val_native()
3. Implementing missing "native representation" related methods
in Field_time and Field_timef:
Field_time::store_native()
Field_time::val_native()
Field_timef::store_native()
Field_timef::val_native()
4. Implementing missing "native" related methods in all Items
that can have the TIME data type:
Item_timefunc::val_native()
Item_name_const::val_native()
Item_time_literal::val_native()
Item_cache_time::val_native()
Item_handled_func::val_native()
5. Marking Type_handler_time_common as "native ready".
So now Item_sum_min_max::update_field() calculates
values using min_max_update_native_field(),
which uses native binary representation rather than string representation.
Before this change, only the TIMESTAMP data type used native
representation to calculate MIN() and MAX().
Benchmarks (see more details in MDEV):
This change not only fixes the wrong result, but also
makes a "SELECT .. MAX.. GROUP BY .." query faster:
# TIME(0)
CREATE TABLE t1 (id INT, time_col TIME) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
MySQL80: 0.159 sec
10.3: 0.108 sec
10.4: 0.094 sec (fixed)
# TIME(6):
CREATE TABLE t1 (id INT, time_col TIME(6)) ENGINE=HEAP;
INSERT INTO t1 VALUES (1,'10:10:10.999999'); -- repeat this 1m times
SELECT id, MAX(time_col) FROM t1 GROUP BY id;
My80: 0.154
10.3: 0.135
10.4: 0.093 (fixed)
Type_handler_temporal_result::Item_func_min_max_fix_attributes()
in an expression GREATEST(string,date), e.g:
SELECT GREATEST('1', CAST('2020-12-12' AS DATE));
incorrectly evaluated decimals as 6 (like for DATETIME).
Adding a separate virtual implementation:
Type_handler_date_common::Item_func_min_max_fix_attributes()
This makes the code simpler.
- Adding optional qualifiers to data types:
CREATE TABLE t1 (a schema.DATE);
Qualifiers now work only for three pre-defined schemas:
mariadb_schema
oracle_schema
maxdb_schema
These schemas are virtual (hard-coded) for now, but may turn into real
databases on disk in the future.
- mariadb_schema.TYPE now always resolves to a true MariaDB data
type TYPE without sql_mode specific translations.
- oracle_schema.DATE translates to MariaDB DATETIME.
- maxdb_schema.TIMESTAMP translates to MariaDB DATETIME.
- Fixing SHOW CREATE TABLE to use a qualifier for a data type TYPE
if the current sql_mode translates TYPE to something else.
The above changes fix the reported problem, so this script:
SET sql_mode=ORACLE;
CREATE TABLE t2 AS SELECT mariadb_date_column FROM t1;
is now replicated as:
SET sql_mode=ORACLE;
CREATE TABLE t2 (mariadb_date_column mariadb_schema.DATE);
and the slave can unambiguously treat DATE as the true MariaDB DATE
without ORACLE specific translation to DATETIME.
Similar,
SET sql_mode=MAXDB;
CREATE TABLE t2 AS SELECT mariadb_timestamp_column FROM t1;
is now replicated as:
SET sql_mode=MAXDB;
CREATE TABLE t2 (mariadb_timestamp_column mariadb_schema.TIMESTAMP);
so the slave treats TIMESTAMP as the true MariaDB TIMESTAMP
without MAXDB specific translation to DATETIME.
Fixing ROUND(date,0), TRUNCATE(date,x), FLOOR(date), CEILING(date)
to return the `int(8) unsigned` data type.
Details:
1. Cleanup: moving virtual implementations
- Type_handler_temporal_result::Item_func_int_val_fix_length_and_dec()
- Type_handler_temporal_result::Item_func_round_fix_length_and_dec()
to Type_handler_date_common. Other temporal data type handlers
override these methods anyway. So they were only DATE specific.
This change makes the code clearer.
2. Backporting DTCollation_numeric from 10.5, to reuse the code easier.
3. Adding the `preferred_attrs` argument to Item_func_round::fix_arg_int(). Now
Type_handler_xxx::Item_func_round_val_fix_length_and_dec() work as follows:
- The INT-alike and YEAR handlers copy preferred_attrs from args[0].
- The DATE handler passes explicit attributes, to get `int(8) unsigned`.
- The hex hybrid handler passes NULL, so fix_arg_int() calculates attributes.
4. Type_handler_date_common::Item_func_int_val_fix_length_and_dec()
now sets the type handler and attributes to get `int(8) unsigned`.
1. Fixing ROUND(x) and TRUNCATE(x,0) with TINYINT, SMALLINT, MEDIUMINT, BIGINT
input to preserve the exact data type of the argument when it's possible.
2. Fixing FLOOR(x) and CEILING(x) with TINYINT, SMALLINT, MEDIUMINT, BIGINT
to preserve the exact data type of the argument.
3. Adding dedicated Type_handler_year::Item_func_round_fix_length_and_dec()
to easier handle ROUND(x) and TRUNCATE(x,y) for the YEAR(2) and YEAR(4)
input. They still return INT(2) UNSIGNED and INT(4) UNSIGNED correspondingly,
as before.
Implementing dedicated fixing methods:
- Type_handler_bit::Item_func_round_fix_length_and_dec()
- Type_handler_bit::Item_func_int_val_fix_length_and_dec()
- Type_handler_typelib::Item_func_round_fix_length_and_dec()
because the inherited methods did not work well.
Fixing:
- Type_handler_typelib::Item_func_int_val_fix_length_and_dec
It did not work well, because it used args[0]->max_length to
calculate the result data type. In case of ENUM and SET it was
not correct, because in FLOOR() and CEILING() context
ENUM and SET return not more than 5 digits (65535 is the biggest
possible value).
Misc:
- Changing the API of
Type_handler_bit::Bit_decimal_notation_int_digits(const Item *item)
to a more generic form:
Type_handler_bit::Bit_decimal_notation_int_digits_by_nbits(uint nbits)
- Fixing Type_handler_bit::Bit_decimal_notation_int_digits_by_nbits() to
return the exact number of decimal digits for all nbits 1..64.
The old implementation was approximate.
This change gives better (more precise) data types.
- Type_handler_hex_hybrid did not override
Type_handler_string_result::Item_func_round_fix_length_and_dec(),
so the result type of ROUND(0xFFFFFFFFFFFFFFFF) was erroneously
calculated ad DOUBLE with a wrong length.
Overriding Item_func_round_fix_length_and_dec(), to calculated
the result type as INT/BIGINT.
Also, fixing Item_func_round::fix_arg_int() to use
args[0]->decimal_precision() instead of args[0]->max_length
when calculating this->max_length, to get a correct result
for hex hybrids.
- Type_handler_hex_hybrid::Item_func_int_val_fix_length_and_dec()
called item->fix_length_and_dec_int_or_decimal(), which did not
produce a correct result data type for hex hybrid.
Implementing a dedicated code instead, to return INT UNSIGNED or
BIGINT UNSIGNED depending in the number of digits in the arguments.
Bit operators (~ ^ | & << >>) and the function BIT_COUNT()
always called val_int() for their arguments.
It worked correctly only for INT type arguments.
In case of DECIMAL and DOUBLE arguments it did not work well:
the argument values were truncated to the maximum SIGNED BIGINT value
of 9223372036854775807.
Fixing the code as follows:
- If the argument if of an integer data type,
it works using val_int() as before.
- If the argument if of some other data type, it gets the argument value
using val_decimal(), to avoid truncation, and then converts the result
to ulonglong.
Using Item_handled_func to switch between the two approaches easier.
As an additional advantage, with Item_handled_func it will be easier
to implement overloading in the future, so data type plugings will be able
to define their own behavioir of bit operators and BIT_COUNT().
Moving the code from the former val_int() implementations
as methods to Longlong_null, to avoid code duplication in the
INT and DECIMAL branches.
Some .c and .cc files are compiled as part of Mariabackup.
Enabling -Wconversion for InnoDB would also enable it for
Mariabackup. The .h files are being included during
InnoDB or Mariabackup compilation.
Notably, GCC 5 (but not GCC 4 or 6 or later versions)
would report -Wconversion for x|=y when the type is
unsigned char. So, we will either write x=(uchar)(x|y)
or disable the -Wconversion warning for GCC 5.
bitmap_set_bit(), bitmap_flip_bit(), bitmap_clear_bit(), bitmap_is_set():
Always implement as inline functions.
This task deals with packing the sort key inside the sort buffer, which would
lead to efficient usage of the memory allocated for the sort buffer.
The changes brought by this feature are
1) Sort buffers would have sort keys of variable length
2) The format for sort keys inside the sort buffer would look like
|<sort_length><null_byte><key_part1><null_byte><key_part2>.......|
sort_length is the extra bytes that are required to store the variable
length of a sort key.
3) When packing of sort key is done we store the ORIGINAL VALUES inside
the sort buffer and not the STRXFRM form (mem-comparable sort keys).
4) Special comparison function packed_keys_comparison() is introduced
to compare 2 sort keys.
This patch also contains contributions from Sergei Petrunia.
Old temporal data types (created with a pre-10.0 version of MariaDB)
are now displayed with a /* mariadb-5.3 */ comment in:
- SHOW CREATE TABLE
- DESCRIBE
- INFORMATION_SCHEMA.COLUMNS.COLUMN_TYPE
For example:
CREATE TABLE `t1` (
`t0` datetime /* mariadb-5.3 */ DEFAULT NULL,
`t6` datetime(6) /* mariadb-5.3 */ DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Note, new temporal data types are displayed without a format comment.
In order to:
- unify sql_yacc.yy and sql_yacc_ora.yy easier
- move more functionality from the parser to Type_handler
(so plugins can override the behavior)
this patch:
- removes rules sp_param_field_type_string and sp_param_field_type
from sql_yacc_ora.yy
- adds a new virtial method Type_handler::Column_definition_set_attributes()